OAR-OCR

An Optical Character Recognition (OCR) and Document Layout Analysis library written in Rust.

Quick Start

Installation

cargo add oar-ocr

With GPU support:

cargo add oar-ocr --features cuda

With auto-download of model files from ModelScope:

cargo add oar-ocr --features auto-download

Bare file names passed to the builders are then fetched from ModelScope into $OAR_HOME (default ~/.oar) and verified against their expected SHA-256. See docs/models.md for the exact path resolution rules.

Basic Usage

use oar_ocr::prelude::*;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize the OCR pipeline
    let ocr = OAROCRBuilder::new(
        "pp-ocrv5_mobile_det.onnx",
        "pp-ocrv5_mobile_rec.onnx",
        "ppocrv5_dict.txt",
    )
    .build()?;

    // Load an image
    let image = load_image(Path::new("document.jpg"))?;
    
    // Run prediction
    let results = ocr.predict(vec![image])?;

    // Process results
    for text_region in &results[0].text_regions {
        if let Some((text, confidence)) = text_region.text_with_confidence() {
            println!("Text: {} ({:.2})", text, confidence);
        }
    }

    Ok(())
}

PP-OCRv6

PP-OCRv6 ships in three sizes. Pass the bare file names below and enable the auto-download feature to fetch them automatically or point to local paths:

Size	Detection	Recognition	Dictionary
tiny	`pp-ocrv6_tiny_det.onnx`	`pp-ocrv6_tiny_rec.onnx`	`ppocrv6_tiny_dict.txt`
small	`pp-ocrv6_small_det.onnx`	`pp-ocrv6_small_rec.onnx`	`ppocrv6_dict.txt`
medium	`pp-ocrv6_medium_det.onnx`	`pp-ocrv6_medium_rec.onnx`	`ppocrv6_dict.txt`

use oar_ocr::prelude::*;
use oar_ocr::domain::tasks::TextDetectionConfig;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // PP-OCRv6 "tiny" — swap in small/medium (with ppocrv6_dict.txt) for higher accuracy.
    let ocr = OAROCRBuilder::new(
        "pp-ocrv6_tiny_det.onnx",
        "pp-ocrv6_tiny_rec.onnx",
        "ppocrv6_tiny_dict.txt",
    )
    // Official PP-OCRv6 detection defaults.
    .text_detection_config(TextDetectionConfig {
        score_threshold: 0.2,
        box_threshold: 0.45,
        unclip_ratio: 1.4,
        ..Default::default()
    })
    .build()?;

    let image = load_image(Path::new("document.jpg"))?;
    let results = ocr.predict(vec![image])?;

    for region in &results[0].text_regions {
        if let Some((text, confidence)) = region.text_with_confidence() {
            println!("{text}  ({confidence:.2})");
        }
    }

    Ok(())
}

Document Structure Analysis

use oar_ocr::prelude::*;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize structure analysis pipeline
    let structure = OARStructureBuilder::new("pp-doclayout_plus-l.onnx")
        .with_table_classification("pp-lcnet_x1_0_table_cls.onnx")
        .with_table_structure_recognition("slanet_plus.onnx", "wireless")
        .table_structure_dict_path("table_structure_dict_ch.txt")
        .with_ocr(
            "pp-ocrv5_mobile_det.onnx", 
            "pp-ocrv5_mobile_rec.onnx", 
            "ppocrv5_dict.txt"
        )
        .build()?;
        
    // Analyze document
    let result = structure.predict("document.jpg")?;
    
    // Output Markdown
    println!("{}", result.to_markdown());
    
    Ok(())
}

Vision-Language Models (VLM)

For advanced document understanding using Vision-Language Models (like PaddleOCR-VL, PaddleOCR-VL-1.5, PaddleOCR-VL-1.6, GLM-OCR, HunyuanOCR, and MinerU2.5), check out the oar-ocr-vl crate.

Documentation

Usage Guide - Detailed API usage, builder patterns, GPU configuration
Pre-trained Models - Model download links and recommended configurations

Examples

The examples/ directory contains complete examples for various tasks:

# General OCR
cargo run --example ocr -- --help

# Document Structure Analysis
cargo run --example structure -- --help

# Layout Detection
cargo run --example layout_detection -- --help

# Table Structure Recognition
cargo run --example table_structure_recognition -- --help

Acknowledgments

This project builds upon the excellent work of several open-source projects:

ort: Rust bindings for ONNX Runtime by pykeio. This crate provides the Rust interface to ONNX Runtime that powers the efficient inference engine in this OCR library.
PaddleOCR: Baidu's awesome multilingual OCR toolkits based on PaddlePaddle. This project utilizes PaddleOCR's pre-trained models, which provide excellent accuracy and performance for text detection and recognition across multiple languages.
Candle: A minimalist ML framework for Rust by Hugging Face. We use Candle to implement Vision-Language model inference.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.github		.github
docs		docs
examples		examples
oar-ocr-core		oar-ocr-core
oar-ocr-derive		oar-ocr-derive
oar-ocr-vl		oar-ocr-vl
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OAR-OCR

Quick Start

Installation

Basic Usage

PP-OCRv6

Document Structure Analysis

Vision-Language Models (VLM)

Documentation

Examples

Acknowledgments

About

Uh oh!

Releases 17

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OAR-OCR

Quick Start

Installation

Basic Usage

PP-OCRv6

Document Structure Analysis

Vision-Language Models (VLM)

Documentation

Examples

Acknowledgments

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 17

Uh oh!

Contributors

Uh oh!

Languages