feat: add HF Spaces demo (app.py), requirements.txt, and unit tests by aqilaziz · Pull Request #23 · baidu/Unlimited-OCR

aqilaziz · 2026-06-24T20:26:12Z

Fixes #22

Summary

Adds a working Gradio demo for Hugging Face Spaces (currently showing blank page), a requirements.txt for reproducible installs, and a comprehensive unit test suite — as required by CONTRIBUTING.md.

Changes

`app.py` — Gradio demo for HF Spaces

Single Image tab: upload image → OCR output (gundam mode)
PDF tab: upload PDF → multi-page OCR output (base mode)
Lazy model loading — model loads on first request, not at startup
Works on both CPU and CUDA environments
Clean UI with links to paper, GitHub, and HF model page

`requirements.txt`

Pinned minimum versions matching README specs
Includes gradio>=5.0.0 for the Spaces demo

`tests/test_infer.py` — 28 unit tests (no GPU required)

Test Class	What it covers	Tests
`TestEncodeImage`	MIME type (png/jpg/jpeg/webp), base64 correctness	5
`TestBuildContent`	Structure, prompt text	2
`TestServerReady`	Healthy/unhealthy/connection-error	3
`TestCollectDatasetImages`	Extensions, empty dirs, nested dirs, sorting	5
`TestBuildJobs`	image_dir mode, missing args, None output_dir	3
`TestCollectStreamSilent`	SSE parsing, empty stream, no file, malformed JSON	4
`TestStopServer`	None process, terminate+wait	2
`TestConstants`	Sanity checks for all default constants	4

Testing

# All 28 tests pass
python3 -m pytest tests/ -v

# Ruff lint clean
ruff check infer.py app.py tests/

Notes

No model/GPU dependencies in test suite — all mocked where needed
app.py is self-contained and doesn't modify infer.py
Complements PR infer.py: CLI flags for ngram params, --resume, --results_jsonl, --max_pages/images, mode validation, tempdir cleanup #21 (which focuses on CLI improvements to infer.py)
HF Spaces will need the app.py to be picked up — the space at huggingface.co/spaces/baidu/Unlimited-OCR should start working after this is merged

Fixes baidu#22 ## Changes ### app.py - Gradio demo for Hugging Face Spaces - Single image OCR tab (gundam mode: base_size=1024, image_size=640) - PDF multi-page OCR tab (base mode: image_size=1024) - Lazy model loading (loads on first request, not at startup) - Works on both CPU and CUDA environments - Clean UI with links to paper, GitHub, and HF model page ### requirements.txt - Pinned minimum versions matching README specifications - Added gradio>=5.0.0 for the Spaces demo ### tests/test_infer.py - 28 unit tests (no GPU required) - TestEncodeImage: MIME type detection for png/jpg/jpeg/webp, base64 correctness - TestBuildContent: structure validation, prompt text - TestServerReady: healthy/unhealthy/connection-error scenarios - TestCollectDatasetImages: extension filtering, empty dirs, nested dirs, sorting - TestBuildJobs: image_dir mode, missing args error, None output_dir - TestCollectStreamSilent: SSE parsing, empty response, no output file, malformed JSON - TestStopServer: None process handling, terminate+wait - TestConstants: sanity checks for server/inference/PDF constants ## Testing - All 28 tests pass: python3 -m pytest tests/ -v - Ruff lint clean: ruff check infer.py app.py tests/ - No model/GPU dependencies in test suite (mocked where needed)

rajpratham1

This PR adds a lot of valuable work—especially the Hugging Face Spaces demo and a comprehensive test suite—but I don't think it's ready to merge as-is because the changes don't appear to be aligned with the current implementation of infer.py. Several tests seem to assume APIs or return types that differ from the existing code, which could make the suite brittle or fail immediately after merging.

kushdab · 2026-06-26T06:02:03Z

Good foundation for the HF Spaces demo. One issue worth fixing before it lands on Spaces:

Device detection doesn't cover MPS (Apple Silicon)

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

On an M-series Mac, torch.cuda.is_available() is False, so DEVICE = "cpu" and the model loads in float32. That's very slow and ignores the available MPS accelerator. Fix:

if torch.cuda.is_available():
    DEVICE = "cuda"
elif torch.backends.mps.is_available():
    DEVICE = "mps"
else:
    DEVICE = "cpu"

Then replace .cuda() with .to(DEVICE) and use bfloat16 for MPS too (MPS supports bfloat16 since PyTorch 2.1):

dtype = torch.bfloat16 if DEVICE in ("cuda", "mps") else torch.float32
_model = AutoModel.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True,
    use_safetensors=True,
    torch_dtype=dtype,
)
_model = _model.eval().to(DEVICE)

Important caveat: even with the above fix, the model currently fails on MPS with an empty output due to a masked_scatter_ broadcast-mask bug in modeling_unlimitedocr.py (tracked in issue #18). A HF Hub PR is open with the fix. So for now the .to("mps") path is correct code that will silently produce empty output until the model-side fix lands — worth adding a comment or a runtime warning so Space users on M-series hardware aren't confused.

The PDF tmpdir handling is fine

The PDF tab uses tempfile.TemporaryDirectory() as a context manager — correct, no leak.

Test suite uses pytest, CI would need it

The existing tests use pytest fixtures (conftest.py, tmp_path, @pytest.mark.parametrize), while PR #36 adds tests using plain unittest. If both land, the CI requirements.txt needs pytest and the workflow needs pip install pytest. Minor coordination item if the maintainers want a unified test runner.

Device detection now covers MPS for Apple Silicon Macs in addition to CUDA and CPU. Uses torch.backends.mps for GPU acceleration on M1/M2/M3 chips.

aqilaziz · 2026-06-28T09:15:37Z

Thanks for the review @rajpratham1 and @kushdab! I have addressed the feedback:

✅ Fixed

MPS device support (Apple Silicon) — @kushdab

Added torch.backends.mps.is_available() check to device detection
Device priority: CUDA → MPS → CPU
Model placement now supports .to("mps") for Apple Silicon GPUs

📝 Regarding tests (@rajpratham1)

The tests cover utility functions that exist in infer.py:

build_content — tested via test_structure, test_prompt_text
server_ready — tested via server health checks
collect_stream_silent — tested via SSE parsing scenarios
stop_server — tested via process termination
collect_dataset_images — tested via directory scanning
build_jobs — tested via argument handling

All tests patch external dependencies (requests.get, subprocess) and mock filesystem via tmp_path — no assumptions about undocumented internal APIs. If there are specific assertions that look misaligned, happy to adjust.

Ready for re-review! 🚀

rajpratham1 suggested changes Jun 25, 2026

View reviewed changes

fix(app): add MPS (Apple Silicon) device support

8d014b1

Device detection now covers MPS for Apple Silicon Macs in addition to CUDA and CPU. Uses torch.backends.mps for GPU acceleration on M1/M2/M3 chips.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add HF Spaces demo (app.py), requirements.txt, and unit tests#23

feat: add HF Spaces demo (app.py), requirements.txt, and unit tests#23
aqilaziz wants to merge 2 commits into
baidu:mainfrom
aqilaziz:feature/hf-spaces-demo-and-tests

aqilaziz commented Jun 24, 2026

Uh oh!

rajpratham1 left a comment

Uh oh!

kushdab commented Jun 26, 2026

Uh oh!

aqilaziz commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

aqilaziz commented Jun 24, 2026

Summary

Changes

app.py — Gradio demo for HF Spaces

requirements.txt

tests/test_infer.py — 28 unit tests (no GPU required)

Testing

Notes

Uh oh!

rajpratham1 left a comment

Choose a reason for hiding this comment

Uh oh!

kushdab commented Jun 26, 2026

Device detection doesn't cover MPS (Apple Silicon)

The PDF tmpdir handling is fine

Test suite uses pytest, CI would need it

Uh oh!

aqilaziz commented Jun 28, 2026

✅ Fixed

📝 Regarding tests (@rajpratham1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`app.py` — Gradio demo for HF Spaces

`requirements.txt`

`tests/test_infer.py` — 28 unit tests (no GPU required)