Skip to content

opendatalab/OmniDocLayout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

OmniDocLayout: Towards Diverse Document Layout Generation via
Coarse-to-Fine LLM Learning

Paper PDF    

📢 Latest News

  • [2026.06.10]: 🔥OmniDocLayout-1M dataset is available on HuggingFace. Click here to download it.
  • [2026.04.09]: OmniDocLayout has been selected as a "Highlight" paper at CVPR 2026! 🎉🎉🎉
  • [2026.02.21]: OmniDocLayout has been accepted by CVPR 2026! 🎉🎉🎉
  • [2025.11.24]: We have released our paper on arXiv. Check out the paper here.

📝 Overview

Document AI has advanced rapidly and attracted increasing attention in both academia and industry. However, while most existing efforts focus on document layout analysis (DLA), its generative counterpart, document layout generation, remains relatively underexplored.

Compared with traditional graphic layout design or room layout planning, document layout generation is more challenging because each page usually contains a larger number of elements and exhibits more diverse structural patterns. Existing document layout generation datasets are often dominated by simple academic paper layouts, while modern and complex document types such as newspapers, magazines, textbooks, exam papers, and slides remain underrepresented.

To address these limitations, we introduce OmniDocLayout, a new framework for diverse document layout generation. The paper mainly contains two parts:

  • OmniDocLayout-1M: the first million-scale dataset for diverse document layout generation, covering six common document types and approximately 48M annotated layout elements.
  • OmniDocLayout-LLM: a lightweight 0.5B LLM trained with a two-stage Coarse-to-Fine learning paradigm, which first learns general layout principles from OmniDocLayout-1M and then adapts to fine-grained complex document domains.
OmniDocLayout Overview

🏆 Contribution

  • We introduce OmniDocLayout-1M, the first million-scale document layout dataset for diverse document layout generation, covering six common document types: textbook, newspaper, magazine, exam, academic paper, and slide.
  • We propose OmniDocLayout-LLM, a lightweight 0.5B model trained with a Coarse-to-Fine learning paradigm, enabling effective transfer from coarse document layout principles to fine-grained complex domains.
  • Extensive experiments on M6Doc demonstrate that OmniDocLayout-LLM achieves strong performance across multiple document types and layout generation tasks.

📦 Dataset

OmniDocLayout-1M is designed to support large-scale training for document layout generation. It covers six common document types from real-world scenarios:

Type File Volume
Textbook textbook.json 200,000
Newspaper newspaper.json 207,679
Magazine magazine.json 195,008
Exam paper exam.json 90,360
Academic paper academic.json 200,000
Slide slide.json 100,000
Total - 993,047

The dataset is collected from 36 public and copyright-clean sources, including academic databases, publishers, and document-sharing platforms. It covers diverse domains such as academia, education, news, economics, and more.

Dataset Highlights

  • Large Scale: approximately 1M document pages and about 48M annotated layout elements.
  • Diverse Types: 6 challenging and complicated document types from 36 public and copyright-clean sources.
  • Reading Order: annotations follow a natural reading order, which is important for autoregressive layout generation.
  • Quality Assessment: blind human evaluation shows that more than 92% of sampled annotations have similar perceived quality to manual annotations.
Dataset Statistics

🧠 OmniDocLayout-LLM

The core idea of OmniDocLayout-LLM is a two-stage Coarse-to-Fine learning paradigm.

Stage 1: Coarse-grained Learning

In the first stage, the model learns universal document layout principles from OmniDocLayout-1M with coarse-grained labels. This stage helps the model acquire transferable spatial priors, such as:

  • Alignment
  • Non-overlapping arrangement
  • Reading order
  • Spatial grouping
  • ...

Stage 2: Fine-grained Adaptation

In the second stage, the model is adapted to a specific complex document domain with fine-grained labels. For example, a coarse category such as title can be mapped to fine-grained domain-specific categories such as:

  • Title
  • Headline
  • First-level title
  • Second-level title
  • ...

This design allows the model to benefit from large-scale coarse-grained layout knowledge while requiring only limited fine-grained annotations for adaptation.

Our model supports five layout generation tasks:

Task Description
U-Cond Unconditional layout generation without external constraints.
C→S+P Given element categories, predict sizes and positions.
C+S→P Given element categories and sizes, predict positions.
Completion Complete the remaining layout given a subset of existing elements.
Refinement Recover a clean layout from perturbed geometric attributes.
OmniDocLayout-LLM Framework

📊 Performance

We compare several visual examples of various methods under U-Cond Task as follows. For general-purpose LLMs, we adopt the strongest 5-shot setting.

Performance Comparison

😄 Acknowledgement

We thank the developers of the following projects and tools:

  • MinerU for document parsing and layout annotation.
  • DocLayout-YOLO for dense document layout detection.
  • SWIFT for the model training and inference.

📜 Citation

If you find this project useful for your research, please consider giving us a star and citing our paper:

@inproceedings{kang2026omnidoclayout,
  title={OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning},
  author={Kang, Hengrui and Gu, Zhuangcheng and Zhao, Zhiyuan and Wen, Zichen and Wang, Bin and Li, Weijia and He, Conghui},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3208--3218},
  year={2026}
}

About

[CVPR26 Highlight] The official implementation of the paper "OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors