Add Docker image publishing and API runtime by marcelMaier · Pull Request #39 · baidu/Unlimited-OCR

marcelMaier · 2026-06-26T10:26:31Z

Summary

Add a CUDA-based Docker image that starts the OpenAI-compatible SGLang API server by default
Use a CUDA devel image only for the build stage and a smaller CUDA runtime image for the final stage
Build a wheelhouse in the build stage and install from it in the runtime stage to keep build dependencies out of the final image
Add --trust-remote-code to the SGLang server command for the custom Unlimited-OCR model code
Remove the redundant kernels install so the bundled SGLang wheel keeps its matching kernel package
Keep batch inference available by overriding the container command with python infer.py ...
Document Docker usage, PDF batch mode with --image_mode base, and the publishing secrets maintainers can configure

Publishing

The workflow publishes to GHCR with the built-in GITHUB_TOKEN. To also publish to Docker Hub, repository maintainers only need to add these repository secrets:

DOCKERHUB_USERNAME
DOCKERHUB_TOKEN

Validation

git diff --check
docker compose config
YAML parsed for .github/workflows/docker.yml and docker-compose.yml
Verified Docker Hub contains the selected nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 base image tag

marcelMaier · 2026-06-26T10:31:14Z

This is building on my other PR and activates an API Server so theres a ready to use inference.

kushdab · 2026-06-26T12:02:07Z

Well-structured PR the multi-stage build, non-root user, named volumes, and the GHCR CI workflow are all done correctly. A few issues worth fixing before this lands:

🐛 Runtime stage still uses the `devel` image doubles image size for no benefit

ARG CUDA_IMAGE=nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04
FROM ${CUDA_IMAGE} AS build   # fine: needs nvcc and headers
...
FROM ${CUDA_IMAGE} AS runtime  # ← wrong: devel is ~7–8 GB

The runtime stage only needs the CUDA runtime libraries not the compiler, headers, or cuDNN development files. Replace the second FROM:

FROM nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 AS runtime

This cuts the final image size by roughly half (devel ≈ 7.5 GB → runtime ≈ 3.5 GB), which matters for first-pull time and registry storage.

🐛 `--trust-remote-code` is missing from the Docker `CMD`

baidu/Unlimited-OCR ships custom model code in modeling_unlimitedocr.py. Without --trust-remote-code, SGLang will refuse to load it and exit immediately:

Error: Loading baidu/Unlimited-OCR requires --trust-remote-code

Add the flag to the CMD:

CMD ["python", "-m", "sglang.launch_server", \
    "--model", "baidu/Unlimited-OCR", \
    "--trust-remote-code", \          # ← add this
    "--served-model-name", "Unlimited-OCR", \
    ...

🐛 `kernels==0.11.7` in `requirements-sglang.txt` causes silent model registration failure

The custom SGLang wheel bundles its own sgl_kernel. Installing kernels==0.11.7 alongside it downgrades sgl_kernel, which breaks SGLang's unlimited_ocr.py model registration at import time. SGLang swallows the import error and starts anyway, then crashes at inference with ValueError: UnlimitedOCRForCausalLM is not supported by SGLang (see issue #12). PR #34 removes this line from the README; the same fix should apply to this file:

-./wheel/sglang-0.0.0.dev11416+g92e8bb79e-py3-none-any.whl
-kernels==0.11.7
-pymupdf==1.27.2.2
+./wheel/sglang-0.0.0.dev11416+g92e8bb79e-py3-none-any.whl
+pymupdf==1.27.2.2

If kernels genuinely needs to be pinned for the bundled wheel, the correct version is 0.9.0 (matching the existing wheel manifest) — but in practice the wheel already bundles the right sgl_kernel and the separate kernels install is redundant.

📖 PDF batch inference example uses `--image_mode gundam` — should be `base`

python infer.py \
    --pdf /data/document.pdf \
    --image_mode gundam    # ← wrong for PDF

gundam uses per-tile crop inference via infer(), which is for single images only. For multi-page PDF mode, base calls infer_multi() which processes the full page context. Using gundam with --pdf silently produces wrong output or empty results. Change to --image_mode base in both the README examples and the compose file defaults.

(This same issue was flagged in the reviews for PRs #21 and #36 — worth making sure it doesn't sneak into documentation.)

Multi-GPU note (informational, not blocking)

docker run --gpus all exposes all GPU devices to the container, but SGLang uses only one unless --tensor-parallel-size N is passed to launch_server. Worth a one-liner in the docs:

# Multi-GPU: add --tensor-parallel-size to match the number of GPUs
docker run ... unlimited-ocr:local  # default: single GPU
docker run ... -e TP_SIZE=4 ...     # or pass --tensor-parallel-size 4 in CMD

What's done well

Multi-stage build is clean venv copy eliminates heavy build deps from runtime layer
Non-root unlimited user with explicit UID/GID args is the right pattern
Volume layout (/data, /app/outputs, /app/log, HF cache) is sensible and matches infer.py conventions
GHCR CI with GITHUB_TOKEN + optional Docker Hub secrets is the standard approach
cancel-in-progress: true on the concurrency group prevents stale image builds from racing
paths: filter on the workflow trigger avoids unnecessary Docker builds on unrelated commits

Fix the four items above (runtime image, --trust-remote-code, kernels removal, --image_mode base in docs) and this is ready to merge.

marcelMaier mentioned this pull request Jun 26, 2026

[feat] Add Docker image publishing for easy adoption for everyone. #37

Closed

marcelMaier force-pushed the docker-image-publishing branch from b79f954 to 6712ac3 Compare June 26, 2026 14:02

Add Docker image publishing and API runtime

1844d87

marcelMaier force-pushed the docker-image-publishing branch from 6712ac3 to 1844d87 Compare June 26, 2026 14:14

kushdab mentioned this pull request Jun 27, 2026

Improve deployment setup: requirements.txt, install.sh, Dockerfile, and README updates #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Docker image publishing and API runtime#39

Add Docker image publishing and API runtime#39
marcelMaier wants to merge 1 commit into
baidu:mainfrom
marcelMaier:docker-image-publishing

marcelMaier commented Jun 26, 2026 •

edited

Loading

Uh oh!

marcelMaier commented Jun 26, 2026

Uh oh!

kushdab commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

marcelMaier commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Publishing

Validation

Uh oh!

marcelMaier commented Jun 26, 2026

Uh oh!

kushdab commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Runtime stage still uses the devel image doubles image size for no benefit

🐛 --trust-remote-code is missing from the Docker CMD

🐛 kernels==0.11.7 in requirements-sglang.txt causes silent model registration failure

📖 PDF batch inference example uses --image_mode gundam — should be base

Multi-GPU note (informational, not blocking)

What's done well

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcelMaier commented Jun 26, 2026 •

edited

Loading

kushdab commented Jun 26, 2026 •

edited

Loading

🐛 Runtime stage still uses the `devel` image doubles image size for no benefit

🐛 `--trust-remote-code` is missing from the Docker `CMD`

🐛 `kernels==0.11.7` in `requirements-sglang.txt` causes silent model registration failure

📖 PDF batch inference example uses `--image_mode gundam` — should be `base`