Skip to content

fix(trainer): detect OpenShift pip permission error and document work…#46

Open
priyank766 wants to merge 1 commit into
kubeflow:mainfrom
priyank766:feat/openshift-pip-fix
Open

fix(trainer): detect OpenShift pip permission error and document work…#46
priyank766 wants to merge 1 commit into
kubeflow:mainfrom
priyank766:feat/openshift-pip-fix

Conversation

@priyank766

@priyank766 priyank766 commented Jun 25, 2026

Copy link
Copy Markdown

fix(trainer): detect OpenShift pip permission error and document workaround

Problem

fixes : #41
On OpenShift clusters with a restricted Security Context Constraint (SCC), the root filesystem is read-only. Running run_custom_training(packages=[...]) executes pip install --user (writing to /.local), which fails with a PermissionError: [Errno 13] Permission denied: '/.local' because user-defined volumes are not mounted during the pre-script installation step.

Changes

  • Log Parser (kubeflow_mcp/trainer/api/monitoring.py): Added an OpenShift-specific pattern to _FAILURE_PATTERNS to catch permission errors on /.local and return the workaround suggestion.
  • Agent Instructions (kubeflow_mcp/trainer/__init__.py): Added prompt guidance under INSTRUCTION_SECTIONS["training"] advising AI agents to avoid the packages parameter on OpenShift.
  • Documentation (kubeflow_mcp/trainer/resources/platform-fixes.md): Documented the issue and the target-directory workaround.
  • Tests (tests/unit/trainer/test_monitoring.py): Added a new test suite to verify the parser correctly matches and prioritizes the OpenShift error over generic permission errors.

Workaround

Do not use the packages parameter on OpenShift. Instead, run the installation within the training script using:

import subprocess, sys, os
lib_dir = '/workspace/lib'
os.makedirs(lib_dir, exist_ok=True)
subprocess.run([sys.executable, '-m', 'pip', 'install', '--target', lib_dir, '--quiet', 'transformers', 'peft', 'trl'], check=True)
sys.path.insert(0, lib_dir)

Verification

  • All 147 unit tests pass.
  • Ruff checks and formatting pass with zero violations.

…around

Signed-off-by: priyank <priyank8445@gmail.com>
Copilot AI review requested due to automatic review settings June 25, 2026 11:15
@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign abhijeet-dhumal for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions

Copy link
Copy Markdown

🎉 Welcome to the Kubeflow MCP Server! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
  • Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

run_custom_training packages parameter fails on OpenShift (read-only /.local)

2 participants