diff --git a/content/en/events/upcoming-events/gsoc-2026.md b/content/en/events/upcoming-events/gsoc-2026.md index 8e1615be7f..ed2529cea1 100644 --- a/content/en/events/upcoming-events/gsoc-2026.md +++ b/content/en/events/upcoming-events/gsoc-2026.md @@ -257,3 +257,43 @@ This will therefore also include working with maintainers of other components su - GitHub Actions - Bash - Community Coordination + +### Project 6: MCP Server for Kubeflow SDK + +**Components:** [kubeflow/sdk](https://github.com/kubeflow/sdk), [kubeflow/trainer](https://github.com/kubeflow/trainer) + +**Mentors:** [@jaiakash](https://github.com/jaiakash), [@dhanishaphadate](https://github.com/dhanishaphadate), [@abhijeet-dhumal](https://github.com/abhijeet-dhumal) + +**Contributor:** [TBD] + +**Details:** +The Kubeflow SDK allows users with limited Kubernetes knowledge to use standard Python APIs to interact with the Kubeflow ecosystem. Documentation: https://sdk.kubeflow.org/en/latest/index.html + +Most of us use LLMs to create/debug code for jobs, models, etc., but currently there is no mechanism for the LLM to see TrainJob status, debug a crash loop, or provide consolidated metrics about previous tasks. We want to extend and improve the Developer Experience (DX) with a Model Context Protocol (MCP) server for the Kubeflow ecosystem. + +We have a [kubeflow/community#936](https://github.com/kubeflow/community/issues/936) and an existing MVP for this project. The contributor will extend the MCP server to cover additional use cases, improve error handling, add comprehensive documentation, and potentially integrate with other Kubeflow components like Model Registry. + +**Core Deliverables:** + +- MCP tools for TrainJob lifecycle (`fine_tune`, `get_training_job`, `list_training_jobs`, `delete_training_job`) +- Pre-flight validation (`get_cluster_resources`, `estimate_resources`, `check_training_prerequisites`) +- Job observability (`get_training_logs`, `get_job_events`) +- Storage setup (`setup_training_storage`) + +**Stretch Goals:** +- Policy-based access control (persona-based RBAC) +- Custom trainer support (`run_custom_training`, `run_container_job`) +- Integration with Model Registry MCP catalog +- Progress tracking (pending [KEP-937](https://github.com/kubeflow/community/pull/937)) + +Tracking issue: https://github.com/kubeflow/sdk/issues/238 + +**Difficulty:** Medium + +**Size:** 175 hours (Medium) + +**Skills Required/Preferred:** +- Experience with LLM / MCP development. +- Familiarity with the Kubeflow SDK and Trainer codebase. +- Understanding of the Kubeflow Ecosystem and basic Kubernetes concepts. +- Engage and contribute to Kubeflow community on Slack and GitHub.