Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions content/en/events/upcoming-events/gsoc-2026.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,3 +257,43 @@ This will therefore also include working with maintainers of other components su
- GitHub Actions
- Bash
- Community Coordination

### Project 6: MCP Server for Kubeflow SDK

**Components:** [kubeflow/sdk](https://github.com/kubeflow/sdk), [kubeflow/trainer](https://github.com/kubeflow/trainer)

**Mentors:** [@jaiakash](https://github.com/jaiakash), [@dhanishaphadate](https://github.com/dhanishaphadate), [@abhijeet-dhumal](https://github.com/abhijeet-dhumal)

**Contributor:** [TBD]

**Details:**
The Kubeflow SDK allows users with limited Kubernetes knowledge to use standard Python APIs to interact with the Kubeflow ecosystem. Documentation: https://sdk.kubeflow.org/en/latest/index.html

Most of us use LLMs to create/debug code for jobs, models, etc., but currently there is no mechanism for the LLM to see TrainJob status, debug a crash loop, or provide consolidated metrics about previous tasks. We want to extend and improve the Developer Experience (DX) with a Model Context Protocol (MCP) server for the Kubeflow ecosystem.

We have a [kubeflow/community#936](https://github.com/kubeflow/community/issues/936) and an existing MVP for this project. The contributor will extend the MCP server to cover additional use cases, improve error handling, add comprehensive documentation, and potentially integrate with other Kubeflow components like Model Registry.

Copilot AI Feb 2, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference format is inconsistent with markdown conventions. The link text "kubeflow/community#936" should be a descriptive text, not a repository reference notation. Consider changing this to match the pattern used elsewhere in the document, such as: "We have an open issue tracking this work and an existing MVP for this project."

Suggested change
We have a [kubeflow/community#936](https://github.com/kubeflow/community/issues/936) and an existing MVP for this project. The contributor will extend the MCP server to cover additional use cases, improve error handling, add comprehensive documentation, and potentially integrate with other Kubeflow components like Model Registry.
We have an [open tracking issue](https://github.com/kubeflow/community/issues/936) and an existing MVP for this project. The contributor will extend the MCP server to cover additional use cases, improve error handling, add comprehensive documentation, and potentially integrate with other Kubeflow components like Model Registry.

Copilot uses AI. Check for mistakes.

**Core Deliverables:**

- MCP tools for TrainJob lifecycle (`fine_tune`, `get_training_job`, `list_training_jobs`, `delete_training_job`)
- Pre-flight validation (`get_cluster_resources`, `estimate_resources`, `check_training_prerequisites`)
- Job observability (`get_training_logs`, `get_job_events`)
- Storage setup (`setup_training_storage`)

**Stretch Goals:**
- Policy-based access control (persona-based RBAC)
- Custom trainer support (`run_custom_training`, `run_container_job`)
- Integration with Model Registry MCP catalog
- Progress tracking (pending [KEP-937](https://github.com/kubeflow/community/pull/937))

Tracking issue: https://github.com/kubeflow/sdk/issues/238

Copilot AI Feb 2, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tracking issue link format is inconsistent with other projects in this file. Project 2 uses the format [kubeflow/katib#2605](https://github.com/kubeflow/katib/issues/2605) (line 152), but this project uses a plain URL. For consistency, consider formatting this as: Tracking issue: [kubeflow/sdk#238](https://github.com/kubeflow/sdk/issues/238)

Copilot uses AI. Check for mistakes.

**Difficulty:** Medium

**Size:** 175 hours (Medium)

**Skills Required/Preferred:**
- Experience with LLM / MCP development.
- Familiarity with the Kubeflow SDK and Trainer codebase.
- Understanding of the Kubeflow Ecosystem and basic Kubernetes concepts.
- Engage and contribute to Kubeflow community on Slack and GitHub.
Loading