chore(docs): katib (optimizer) client support for kubeflow-mcp-server (kep#0001) by Krishna-kg732 · Pull Request #48 · kubeflow/mcp-server

Krishna-kg732 · 2026-06-30T13:24:23Z

Summary

This KEP proposes implementing the Optimizer client module for
kubeflow-mcp-server, exposing Katib's Experiment, Trial, and Suggestion
lifecycle as 17 MCP tools across 5 categories (Planning, Optimization,
Discovery, Monitoring, Lifecycle).

This implements the "Optimizer (Planned: Phase 2)" node already identified
in the architecture and stub module (kubeflow_mcp.optimizer), making Katib
the natural second client after TrainerClient — completing the inner loop of
train -> evaluate -> tune -> retrain without leaving the MCP interface.

Motivation

AI IDEs and orchestrator agents currently have no structured way to:

Launch Katib HPO experiments from natural language descriptions
Inspect experiment progress, individual trial results, or suggestion
algorithm status
Retrieve the best hyperparameter configuration from a completed experiment
Integrate HPO into automated ML pipelines managed by agents

The existing stub declares 8 planned tools with status: "stub" and no
implementations.

Goals

Implement 17 MCP tools across Planning, Optimization, Discovery,
Monitoring, and Lifecycle categories
Decompose experiment creation into agent-friendly tools
(create_hpo_experiment / create_experiment_from_spec), following the
trainer's fine_tune/run_custom_training pattern
Use kubeflow.katib.KatibClient as the primary interface, with
CustomObjectsApi fallback where the SDK lacks coverage
Integrate with existing server infrastructure (personas, audit, rate
limiting, circuit breaker, namespace enforcement)
Two-phase confirmation for all mutating operations
Unit tests at ≥80% coverage, plus integration tests against a live Katib
install

Non-Goals

NAS (Neural Architecture Search) — follow-up after HPO
Custom suggestion algorithm deployment
Katib UI replacement
Direct Katib DB Manager access (get_trial_metrics(), requires gRPC)
Wrapping the tune() API (requires Python callables, not MCP-serializable)
edit_experiment_budget() — deferred

Details

Full design : module structure, tool tables, persona coverage, SDK
compatibility, cross-client integration with TrainerClient, risks and
mitigations, and the testing plan , is in the KEP doc itself.

Status

open for review and discussion before implementation begins.

cc: @jaiakash , @abhijeet-dhumal , @andreyvelich

google-oss-prow · 2026-06-30T13:24:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign abhijeet-dhumal for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Krishna Gupta <Krishnagupta.kg2k6@gmail.com>

Copilot

Pull request overview

Adds a new KEP document proposing Katib (Optimizer) client support in kubeflow-mcp-server, describing the planned MCP tool surface, module structure, personas, and testing approach for integrating hyperparameter optimization workflows alongside the existing Trainer client.

Changes:

Introduces KEP#0001 describing the Optimizer client scope (17 tools across Planning/Optimization/Discovery/Monitoring/Lifecycle).
Documents proposed module layout, tool phase grouping, persona access, and cross-client workflow (train -> tune -> retrain).
Outlines compatibility assumptions, risks/mitigations, and a unit/integration testing plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,429 @@
+---


+- When both clients active, trainer.pre_flight() covers cluster/GPU;
+  katib_pre_flight() covers Katib-specific readiness


+- ALWAYS preview first (confirmed=False)
+- maxTrialCount is required, no unbounded default
+- Use early_stopping with medianstop for long-running trials
+- Trial templates can reference TrainJobs — use trainer.list_runtimes() first


+trainer.pre_flight()               # Validate cluster, GPU availability
+katib_pre_flight()                 # Validate Katib readiness
+trainer.list_runtimes()            # Find training runtimes
+create_hpo_experiment()            # Create experiment with TrainJob trial template


+create_hpo_experiment()            # Create experiment with TrainJob trial template
+wait_for_experiment()              # Wait for completion
+get_best_trial()                   # Get optimal hyperparameters
+trainer.fine_tune()                # Retrain with best config


+
+| Tool | Next Hint |
+|------|-----------|
+| `trainer.wait_for_training` | "Use `create_hpo_experiment()` to optimize hyperparameters" |


+| Persona | Optimizer Tools |
+|---------|----------------|
+| `readonly` | All read-only tools (13 tools: planning + discovery + monitoring) |
+| `data-scientist` | readonly + `create_hpo_experiment`, `wait_for_experiment`, `delete_experiment` (MCP-owned only) |


+1. Implement 17 MCP tools across 5 categories for Katib experiment, trial,
+   and suggestion lifecycle (see MCP Tools tables)


+"optimizer": {
+    "status": "implemented",
+    "sdk_client": "kubeflow.katib.KatibClient",
+    "sdk_version_min": "0.19.0",
+    "covered_methods": [


Copilot AI review requested due to automatic review settings June 30, 2026 13:24

google-oss-prow Bot requested review from abhijeet-dhumal and andreyvelich June 30, 2026 13:24

google-oss-prow Bot requested a review from szaher June 30, 2026 13:24

google-oss-prow Bot added the size/L label Jun 30, 2026

docs(KEP): add KEP#0001 — Katib (Optimizer) client support

06df563

Signed-off-by: Krishna Gupta <Krishnagupta.kg2k6@gmail.com>

Krishna-kg732 force-pushed the docs/KEP_katib_support branch from 6eef165 to 06df563 Compare June 30, 2026 13:27

Krishna-kg732 changed the title ~~docs(KEP) : Katib (Optimizer) Client Support for kubeflow-mcp-server (KEP#0001)~~ chore(docs): Katib (Optimizer) Client Support for kubeflow-mcp-server (KEP#0001) Jun 30, 2026

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Krishna-kg732 changed the title ~~chore(docs): Katib (Optimizer) Client Support for kubeflow-mcp-server (KEP#0001)~~ chore(docs): katib (optimizer) client support for kubeflow-mcp-server (kep#0001) Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(docs): katib (optimizer) client support for kubeflow-mcp-server (kep#0001)#48

chore(docs): katib (optimizer) client support for kubeflow-mcp-server (kep#0001)#48
Krishna-kg732 wants to merge 1 commit into
kubeflow:mainfrom
Krishna-kg732:docs/KEP_katib_support

Krishna-kg732 commented Jun 30, 2026 •

edited

Loading

Uh oh!

google-oss-prow Bot commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		- When both clients active, trainer.pre_flight() covers cluster/GPU;
		katib_pre_flight() covers Katib-specific readiness

		1. Implement 17 MCP tools across 5 categories for Katib experiment, trial,
		and suggestion lifecycle (see MCP Tools tables)

Uh oh!

Conversation

Krishna-kg732 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Goals

Non-Goals

Details

Status

Uh oh!

google-oss-prow Bot commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Krishna-kg732 commented Jun 30, 2026 •

edited

Loading