fix(infer): add max_tokens guard + --trust-remote-code for SGLang by kushdab · Pull Request #29 · baidu/Unlimited-OCR

kushdab · 2026-06-25T15:58:00Z

Summary

Two targeted fixes for infer.py to address hanging inference and SGLang startup errors.

Fix 1: `max_tokens` guard in request payload (fixes #24)

Problem: When inference is run on certain images (sparse tables, high-contrast layouts), the model enters a detection-token repetition loop and never emits EOS. Because infer.py passes no token limit to the SGLang server, generation hangs indefinitely.

Fix: Add "max_tokens": CONTEXT_LENGTH to the request payload. This re-uses the existing CONTEXT_LENGTH constant (default 32768) so the guard is already configurable via --context_length. A natural EOS before the limit is completely unaffected; only runaway generation is stopped.

payload = {
    ...
    "max_tokens": CONTEXT_LENGTH,  # prevents infinite generation loops
}

Fix 2: `--trust-remote-code` in SGLang server launch (fixes #12, part of #27)

Problem: SGLang raises ValueError: Model architecture UnlimitedOCRForCausalLM is not supported on startup because the custom model class is not registered in the standard SGLang model registry.

Fix: Pass --trust-remote-code when launching the server, which tells SGLang to load the model class from the repo's modeling_unlimitedocr.py (same mechanism as Transformers).

python -m sglang.launch_server \
  --model baidu/Unlimited-OCR \
  --trust-remote-code \   # <-- added
  --port 30000

Testing

Tested the hang fix by passing an image that previously caused the loop: generation now terminates at the token limit with a partial but useful output, and the next file proceeds normally.
SGLang server starts without ValueError with --trust-remote-code.

Closes #24 · Part of #27 · Ref #12

- Add `max_tokens: CONTEXT_LENGTH` to SGLang request payload so pathological inputs (repetitive table detection loops) can never cause infer.py to hang indefinitely. The partial output is still captured and saved; a natural EOS before the limit is unaffected. - Pass `--trust-remote-code` when launching the SGLang server so the custom UnlimitedOCRForCausalLM architecture is recognized without a ValueError on startup (fixes baidu#12, baidu#27). Fixes baidu#24 Closes part of baidu#27 (item 2) Ref baidu#12

@emanthen

max_tokens in the OpenAI-compat API is output-only; image + prompt tokens consume part of the 32768 context window before generation begins. A dense 1024px document image uses 1500-4000 image tokens depending on resolution and crop_mode, leaving potentially less than 32768 - image_tokens tokens available for output. Setting max_tokens = CONTEXT_LENGTH (32768) can silently truncate large-image inputs because total_tokens = image_tokens + prompt_tokens + output_tokens <= context_length. Fix: subtract 4096 as a conservative image-token headroom (CONTEXT_LENGTH - 4096 = 28672). Natural EOS before the limit is unaffected; only runaway generation is stopped. Addresses review feedback from @emanthen on baidu#27.

emanthen mentioned this pull request Jun 25, 2026

High-level deployment blockers for community adoption #27

Open

kushdab mentioned this pull request Jun 26, 2026

infer.py: CLI flags for ngram params, --resume, --results_jsonl, --max_pages/images, mode validation, tempdir cleanup #21

Open

emanthen mentioned this pull request Jun 26, 2026

fix(infer+docs): PDF tmpdir cleanup + remove broken kernels install from README #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(infer): add max_tokens guard + --trust-remote-code for SGLang#29

fix(infer): add max_tokens guard + --trust-remote-code for SGLang#29
kushdab wants to merge 2 commits into
baidu:mainfrom
kushdab:fix/infer-max-tokens-and-trust-remote-code

kushdab commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kushdab commented Jun 25, 2026

Summary

Fix 1: max_tokens guard in request payload (fixes #24)

Fix 2: --trust-remote-code in SGLang server launch (fixes #12, part of #27)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix 1: `max_tokens` guard in request payload (fixes #24)

Fix 2: `--trust-remote-code` in SGLang server launch (fixes #12, part of #27)