Containerized dev environment for compressed-tensors, llm-compressor, speculators, and vLLM.
On shared GPU servers, keeping a consistent dev setup across the team is hard — different Python versions, conflicting deps, broken system packages. This repo provides a reproducible, pre-built container with everything needed to develop and test across all four repos. One command to start, no manual install steps, and you can't break the host.
# Run the onboarding script (auto-detects bare-metal vs cluster)
git clone https://github.com/neuralmagic/devenv.git ~/devenv
cd ~/devenv
./setup.sh # clones repos, sets up auth, creates PVCs (cluster), adds shell aliases
source ~/.bashrc # loads aliases, env vars
devenv # pulls image, starts container, attaches tmuxBased on vllm/vllm-openai:latest with compressed-tensors, llm-compressor, speculators, and vllm. Includes Python, torch, transformers, pytest, ruff, Claude Code, VS Code CLI, gcloud CLI, gh CLI, uv, and tmux.
The launch script auto-detects the environment (bare-metal vs OpenShift cluster).
devenv # start container + attach tmux
devenv --down # stop/delete container
devenv --restart # restart with latest image
devenv --gpus 4 # request specific GPU count (cluster mode)
devenv --gpu-type h100 # target specific GPU type (cluster mode)
devenv --name exp1 # named instance (for multiple pods)
devenv --fast-storage # mount tier 1 NVMe at /data (cluster mode)
devenv-status # show pods, GPU availability, PVCs (cluster mode)Re-running devenv after an SSH disconnect reattaches to the existing tmux session.
Each repo gets its own venv (with --system-site-packages to inherit torch/CUDA from the base image). This isolates conflicting deps like transformers across repos.
use speculators # activates venv + cd into repo
use vllm
use llm-compressorVenvs live at /workspace/<repo>/.venv and persist on the PVC. They're only rebuilt when pyproject.toml/setup.py changes.
- Docker (or Podman) with NVIDIA GPU support
- NVIDIA GPUs with CDI configured (
nvidia.com/gpu=all) - Repos cloned at
~/repos/{compressed-tensors,llm-compressor,speculators,vllm}
Repos are bind-mounted from ~/repos/ into /workspace/ — edits on the host appear instantly in the container.
Authenticate on the host or inside the container — gcloud and gh configs are bind-mounted and shared between both:
gcloud auth login
gcloud config set project itpc-gcp-ai-eng-claude
gcloud auth application-default login
gcloud auth application-default set-quota-project cloudability-it-gemini
gh auth login
hf auth loginThe container does not auto-start a server — start it manually:
use vllm
python -m vllm.entrypoints.openai.api_server --model <model> --port 8000The server is reachable at localhost:8000 (also exposed to the host).
ocCLI, logged into the cluster- Namespace:
machine-learning
Run ./setup.sh on the bastion — it handles oc login, PVC creation, and cloning repos onto the PVC.
devenv --gpus 2 # auto-detects cluster when oc is available
devenv --gpu-type h100 # target H100 nodes
devenv --down # delete the pod
devenv-status # show pods, GPU availability, PVCsAuth credentials (gcloud, gh, huggingface, Claude Code) and SSH keys persist across pod restarts via a config PVC mounted at /root/.config, /root/.claude, and /root/.ssh. Authenticate once inside the pod:
gcloud auth login
gcloud config set project itpc-gcp-ai-eng-claude
gcloud auth application-default login
gcloud auth application-default set-quota-project cloudability-it-gemini
gh auth login
hf auth loginAfter authenticating with gh, clone mle-jira and link the skills:
git clone https://github.com/neuralmagic/mle-jira.git /workspace/mle-jira
mkdir -p ~/.claude/skills
ln -sfn /workspace/mle-jira/skills/* ~/.claude/skills/
claude mcp add --transport http atlassian-mcp-server https://mcp.atlassian.com/v1/mcpPersists across pod restarts via the workspace and config PVCs.
A VS Code tunnel starts automatically on pod startup. On first use, authenticate via:
tmux switch -t tunnel # follow the GitHub auth linkAfter that, connect from VS Code on your laptop using the Remote - Tunnels extension.
Use --name to run multiple independent pods. Each instance gets its own repos PVC (auto-created with cloned repos on first launch), while hf-cache, pip-cache, and config remain shared.
devenv --name exp1 --gpus 2
devenv --name exp2 --gpus 1 # separate repos PVCTo tear down, pass the same --name:
devenv --name exp1 --down # deletes pod, keeps repos PVC
oc delete pvc devenv-workspace-$USER-exp1 -n machine-learning # delete repos PVC