Improve deployment setup: requirements.txt, install.sh, Dockerfile, and README updates#41
Improve deployment setup: requirements.txt, install.sh, Dockerfile, and README updates#41Hardik-369 wants to merge 3 commits into
Conversation
…el warning Addresses multiple deployment blockers from issue baidu#27: - Add requirements.txt with all pinned dependencies (Transformers and SGLang paths) - Add install.sh for automated uv-based environment setup - Add Dockerfile for containerized deployment (CUDA 12.9 + Python 3.12) - Update README with prominent warning about the custom SGLang wheel, quick-start section, and Docker usage instructions
There was a problem hiding this comment.
Pull request overview
This PR improves deployment reproducibility and onboarding by adding pinned dependency installation, an automated setup script, containerization via Docker, and clearer README setup/run instructions—addressing the deployment blockers described in #27 (especially the custom SGLang wheel and lack of Docker support).
Changes:
- Added a pinned
requirements.txtthat points pip to the bundledwheel/directory for the custom SGLang wheel. - Introduced
install.shto automate environment creation and dependency installation usinguv. - Added a CUDA 12.9 + Python 3.12
Dockerfileand expanded README with prerequisites, quick start, and Docker usage.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| requirements.txt | Adds a dependency set intended to be reproducible and to resolve the bundled SGLang wheel via --find-links. |
| install.sh | Automates venv creation + installs the bundled SGLang wheel and requirements using uv. |
| Dockerfile | Provides a CUDA runtime image that installs the custom wheel + dependencies and launches an SGLang server by default. |
| README.md | Documents prerequisites, quick start installation paths, and Docker build/run examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
|
|
| ENV PATH="/app/.venv/bin:$PATH" | ||
| ENV CUDA_VISIBLE_DEVICES=0 |
| ENTRYPOINT ["python", "-m", "sglang.launch_server", \ | ||
| "--model", "baidu/Unlimited-OCR", \ | ||
| "--served-model-name", "Unlimited-OCR", \ | ||
| "--attention-backend", "fa3", \ | ||
| "--page-size", "1", \ | ||
| "--mem-fraction-static", "0.8", \ | ||
| "--context-length", "32768", \ | ||
| "--enable-custom-logit-processor", \ | ||
| "--disable-overlap-schedule", \ | ||
| "--skip-server-warmup", \ | ||
| "--host", "0.0.0.0", \ | ||
| "--port", "10000"] |
| # Use a local model directory instead of downloading from Hugging Face | ||
| docker run --gpus all -p 10000:10000 \ | ||
| -v /path/to/local-model:/model \ | ||
| unlimited-ocr --model-dir /model |
| # Batch inference with infer.py (launches server automatically) | ||
| docker run --gpus all \ | ||
| -v /path/to/images:/data \ | ||
| -v /path/to/outputs:/app/outputs \ | ||
| unlimited-ocr python infer.py --image-dir /data --output-dir /app/outputs |
- install.sh: remove set -euo pipefail to avoid changing caller's shell state - Dockerfile: remove ENV CUDA_VISIBLE_DEVICES=0 to let users control GPU at runtime - Dockerfile: change ENTRYPOINT to CMD so infer.py override works - README: fix --model-dir to --model in Docker examples
|
Solid direction -- the Bugs1.
|
| PR #39 (marcelMaier) | PR #41 (this PR) | |
|---|---|---|
| Base image | devel (wrong, ~7.5 GB) |
runtime (correct, ~3.5 GB) |
| Multi-stage build | Yes | No |
| Non-root user | Yes | No |
install.sh |
No | Yes |
requirements.txt |
No | Yes |
--trust-remote-code in CMD |
Missing | Missing |
kernels==0.11.7 |
Present (wrong) | Present (wrong) |
This PR's nvidia/cuda:12.9.0-runtime-ubuntu24.04 base is the correct choice (PR #39 mistakenly uses devel). If the maintainers want to merge one Docker implementation, cherry-picking the runtime base, install.sh, and requirements.txt from this PR while adopting the multi-stage build and non-root user from PR #39 would give the best combined result.
What's done well
nvidia/cuda:12.9.0-runtime-ubuntu24.04(runtime, not devel) is the right base -- cuts final image size roughly in half vs. PR Add Docker image publishing and API runtime #39- Layer ordering is cache-optimized: wheel first (changes rarely), requirements second, application code last
--find-links wheelinrequirements.txtcleanly handles the local wheel without manual path instructionsinstall.shis the clearest new-user onboarding addition in any open PR right now -- the prominent warning about the custom wheel alone will close several support issues- Quick Start section in README is well-structured and covers both uv and manual pip paths
- Add --trust-remote-code to Docker CMD (model uses custom code) - Remove kernels==0.11.7 from requirements.txt (conflicts with SGLang wheel) - Switch Dockerfile to non-root user (ocr:ocr) - Drop COPY assets/ from Dockerfile and add .dockerignore - Fix --image-dir/--output-dir to --image_dir/--output_dir in all docs
Closes #27
This PR addresses several of the deployment blockers raised in #27.
Changes
--find-links wheelto resolve the custom SGLang wheel.nvidia/cuda:12.9.0-runtime-ubuntu24.04with Python 3.12, the custom SGLang wheel, and all dependencies.