Skip to content

neuralmagic/devenv

Repository files navigation

devenv

Containerized dev environment for compressed-tensors, llm-compressor, speculators, and vLLM.

Why

On shared GPU servers, keeping a consistent dev setup across the team is hard — different Python versions, conflicting deps, broken system packages. This repo provides a reproducible, pre-built container with everything needed to develop and test across all four repos. One command to start, no manual install steps, and you can't break the host.

Quick start

# Run the onboarding script (auto-detects bare-metal vs cluster)
git clone https://github.com/neuralmagic/devenv.git ~/devenv
cd ~/devenv
./setup.sh           # clones repos, sets up auth, creates PVCs (cluster), adds shell aliases
source ~/.bashrc     # loads aliases, env vars
devenv               # pulls image, starts container, attaches tmux

Container

Based on vllm/vllm-openai:latest with compressed-tensors, llm-compressor, speculators, and vllm. Includes Python, torch, transformers, pytest, ruff, Claude Code, VS Code CLI, gcloud CLI, gh CLI, uv, and tmux.

Usage

The launch script auto-detects the environment (bare-metal vs OpenShift cluster).

devenv                             # start container + attach tmux
devenv --down                      # stop/delete container
devenv --restart                   # restart with latest image
devenv --gpus 4                    # request specific GPU count (cluster mode)
devenv --gpu-type h100             # target specific GPU type (cluster mode)
devenv --name exp1                 # named instance (for multiple pods)
devenv --fast-storage              # mount tier 1 NVMe at /data (cluster mode)
devenv-status                      # show pods, GPU availability, PVCs (cluster mode)

Re-running devenv after an SSH disconnect reattaches to the existing tmux session.

Per-repo venvs

Each repo gets its own venv (with --system-site-packages to inherit torch/CUDA from the base image). This isolates conflicting deps like transformers across repos.

use speculators       # activates venv + cd into repo
use vllm
use llm-compressor

Venvs live at /workspace/<repo>/.venv and persist on the PVC. They're only rebuilt when pyproject.toml/setup.py changes.

Bare-metal setup

Prerequisites

  • Docker (or Podman) with NVIDIA GPU support
  • NVIDIA GPUs with CDI configured (nvidia.com/gpu=all)
  • Repos cloned at ~/repos/{compressed-tensors,llm-compressor,speculators,vllm}

Repos are bind-mounted from ~/repos/ into /workspace/ — edits on the host appear instantly in the container.

Auth

Authenticate on the host or inside the container — gcloud and gh configs are bind-mounted and shared between both:

gcloud auth login
gcloud config set project itpc-gcp-ai-eng-claude
gcloud auth application-default login
gcloud auth application-default set-quota-project cloudability-it-gemini
gh auth login
hf auth login

vLLM server

The container does not auto-start a server — start it manually:

use vllm
python -m vllm.entrypoints.openai.api_server --model <model> --port 8000

The server is reachable at localhost:8000 (also exposed to the host).

OpenShift cluster setup

Prerequisites

  • oc CLI, logged into the cluster
  • Namespace: machine-learning

First-time setup

Run ./setup.sh on the bastion — it handles oc login, PVC creation, and cloning repos onto the PVC.

Daily use

devenv --gpus 2                    # auto-detects cluster when oc is available
devenv --gpu-type h100             # target H100 nodes
devenv --down                      # delete the pod
devenv-status                      # show pods, GPU availability, PVCs

Auth persistence

Auth credentials (gcloud, gh, huggingface, Claude Code) and SSH keys persist across pod restarts via a config PVC mounted at /root/.config, /root/.claude, and /root/.ssh. Authenticate once inside the pod:

gcloud auth login
gcloud config set project itpc-gcp-ai-eng-claude
gcloud auth application-default login
gcloud auth application-default set-quota-project cloudability-it-gemini
gh auth login
hf auth login

Jira skills (optional)

After authenticating with gh, clone mle-jira and link the skills:

git clone https://github.com/neuralmagic/mle-jira.git /workspace/mle-jira
mkdir -p ~/.claude/skills
ln -sfn /workspace/mle-jira/skills/* ~/.claude/skills/
claude mcp add --transport http atlassian-mcp-server https://mcp.atlassian.com/v1/mcp

Persists across pod restarts via the workspace and config PVCs.

VS Code remote editing

A VS Code tunnel starts automatically on pod startup. On first use, authenticate via:

tmux switch -t tunnel    # follow the GitHub auth link

After that, connect from VS Code on your laptop using the Remote - Tunnels extension.

Multiple instances

Use --name to run multiple independent pods. Each instance gets its own repos PVC (auto-created with cloned repos on first launch), while hf-cache, pip-cache, and config remain shared.

devenv --name exp1 --gpus 2
devenv --name exp2 --gpus 1    # separate repos PVC

To tear down, pass the same --name:

devenv --name exp1 --down      # deletes pod, keeps repos PVC
oc delete pvc devenv-workspace-$USER-exp1 -n machine-learning   # delete repos PVC

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors