Manazir OCR — Arabic optics‑inspired multi‑model OCR

Manazir OCR — Arabic optics‑inspired multi‑model OCR

Fork notice: This is harissaninja/Manazir-OCR, a fork of h9-tec/Manazir-OCR patched for API-only installs. The original requires torch, torchvision, transformers, and accelerate as core dependencies (~3-5GB). This fork moves those to an optional [local-models] extra so you can install the CLI + remote API backends (vLLM, Mistral OCR, OpenAI GPT-4o) without the ML stack.

One-line install
pip install git+https://github.com/harissaninja/Manazir-OCR.git
That's it — no clone, no venv step needed. The manazir CLI is ready to use after this single command.

API-only: pip install git+https://github.com/harissaninja/Manazir-OCR.git

Full local models: pip install git+https://github.com/harissaninja/Manazir-OCR.git#egg=manazir-ocr[local-models]

Manazir OCR is an Arabic‑first, layout‑aware OCR framework inspired by Ibn al‑Haytham’s Kitāb al‑Manāẓir (Book of Optics). It orchestrates multiple backends (local Transformers, vLLM servers, lightweight engines, and commercial APIs) to extract high‑quality text from PDFs and images, producing Markdown/HTML with layout blocks and figure crops.

🌟 Features

Arabic‑first multi‑model architecture: choose or auto‑route among VLMs and classic OCR.
Layout‑aware outputs: HTML/Markdown with reading order, chunk metadata, and image extraction.
Flexible runtimes: local Hugging Face (Transformers) or remote vLLM; pluggable registry + factory.
CLI and Streamlit UIs: production CLI and two apps (basic + professional, with an optional Nerd theme).
Extensible: add models via a simple registry entry; commercial APIs optional.

🚀 Quickstart

Install from source (recommended):

python -m venv venv
source venv/bin/activate  # on Windows: venv\Scripts\activate
pip install -U pip
pip install -e .

Run the CLI:

# Batch processing with layout-aware pipeline (vLLM or HF)
manazir process <input_path> <output_dir> --method vllm --paginate_output

# Arabic-first single-file OCR with model selection
manazir ocr file.pdf --language ar --quality highest --device cuda

# Browse models and get recommendations
manazir list-available-models
manazir recommend --language ar --document-type handwritten

Launch the apps:

manazir_app          # Basic demo (choose Classic hf/vllm or pick a Registry model)
manazir_app_pro      # Professional UI (toggle Nerd Theme in sidebar)

▶️ How to run

Run the Professional app (recommended):

manazir_app_pro
# then open http://localhost:8501

Alternative:

python -m docustruct.scripts.run_app_professional

Run via CLI:

# Convert a PDF to Markdown using Arabic-first defaults
manazir ocr input.pdf --language ar --output out_dir

# List models and pick one
manazir list-available-models
manazir ocr input.png --language ar --model qwen2_vl_2b

🧩 Supported Arabic models (highlights)

Specialized: qari_ocr, dimi_arabic_ocr, ocr_rl2, trocr_arabic
Multilingual VLMs: qwen2_vl_2b, qwen2_5_vl_7b_arabic
Lightweight: paddle_ocr_arabic, paddle_ocr_arabic_v4, tesseract, easy_ocr
Layout toolkit: surya_ocr, surya_ocr_arabic
Commercial (optional): openai_gpt4o, mistral_ocr

Additions in this repo: qwen2_5_vl_7b_arabic, paddle_ocr_arabic_v4, surya_ocr_arabic, qari_ocr_waraqon.

Model browser

🛠️ Programmatic usage

from docustruct.model import create_model
from PIL import Image

model = create_model("qwen2_5_vl_7b_arabic", device="cuda")
img = Image.open("page.png").convert("RGB")
result = model.process_image(img)
print(result.text)

⚙️ vLLM (optional)

The CLI’s vLLM path uses OpenAI‑compatible endpoints. Configure via env:

export VLLM_API_KEY=EMPTY
export VLLM_API_BASE=http://localhost:8000/v1
export VLLM_MODEL_NAME=manazir

📑 Notes

Import path remains docustruct.* for now; package name is manazir-ocr.
Some backends require extra installs (e.g., paddleocr). Commercial APIs need keys and may incur costs.

📁 Assets

Store project screenshots and images in assets/.
Included: assets/screenshot-ui.png, assets/screenshot-available-models.png.

👤 Author

Original development by Hesham Haroon — contact: heshamharoon19@gmail.com. API-only fork by harissaninja — see github.com/harissaninja/Manazir-OCR.

📜 License

Code is Apache‑2.0. Some integrated models (e.g., Surya) are GPL‑3.0; verify licenses before use.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
assets		assets
docustruct		docustruct
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MODEL_LICENSE		MODEL_LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manazir OCR — Arabic optics‑inspired multi‑model OCR

One-line install

🌟 Features

🚀 Quickstart

▶️ How to run

🧩 Supported Arabic models (highlights)

Model browser

🛠️ Programmatic usage

⚙️ vLLM (optional)

📑 Notes

📁 Assets

👤 Author

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Manazir OCR — Arabic optics‑inspired multi‑model OCR

One-line install

🌟 Features

🚀 Quickstart

▶️ How to run

🧩 Supported Arabic models (highlights)

Model browser

🛠️ Programmatic usage

⚙️ vLLM (optional)

📑 Notes

📁 Assets

👤 Author

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages