GitHub - opendatalab/OmniDocLayout: [CVPR26 Highlight] The official implementation of the paper "OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning"

OmniDocLayout: Towards Diverse Document Layout Generation via
Coarse-to-Fine LLM Learning

Hengrui Kang*, Zhuangcheng Gu*, Zhiyuan Zhao, Zichen Wen, Bin Wang, Weijia Li^†, Conghui He^†

📢 Latest News

[2026.06.10]: 🔥OmniDocLayout-1M dataset is available on HuggingFace. Click here to download it.
[2026.04.09]: OmniDocLayout has been selected as a "Highlight" paper at CVPR 2026! 🎉🎉🎉
[2026.02.21]: OmniDocLayout has been accepted by CVPR 2026! 🎉🎉🎉
[2025.11.24]: We have released our paper on arXiv. Check out the paper here.

📝 Overview

Document AI has advanced rapidly and attracted increasing attention in both academia and industry. However, while most existing efforts focus on document layout analysis (DLA), its generative counterpart, document layout generation, remains relatively underexplored.

Compared with traditional graphic layout design or room layout planning, document layout generation is more challenging because each page usually contains a larger number of elements and exhibits more diverse structural patterns. Existing document layout generation datasets are often dominated by simple academic paper layouts, while modern and complex document types such as newspapers, magazines, textbooks, exam papers, and slides remain underrepresented.

To address these limitations, we introduce OmniDocLayout, a new framework for diverse document layout generation. The paper mainly contains two parts:

OmniDocLayout-1M: the first million-scale dataset for diverse document layout generation, covering six common document types and approximately 48M annotated layout elements.
OmniDocLayout-LLM: a lightweight 0.5B LLM trained with a two-stage Coarse-to-Fine learning paradigm, which first learns general layout principles from OmniDocLayout-1M and then adapts to fine-grained complex document domains.

🏆 Contribution

We introduce OmniDocLayout-1M, the first million-scale document layout dataset for diverse document layout generation, covering six common document types: textbook, newspaper, magazine, exam, academic paper, and slide.
We propose OmniDocLayout-LLM, a lightweight 0.5B model trained with a Coarse-to-Fine learning paradigm, enabling effective transfer from coarse document layout principles to fine-grained complex domains.
Extensive experiments on M⁶Doc demonstrate that OmniDocLayout-LLM achieves strong performance across multiple document types and layout generation tasks.

📦 Dataset

OmniDocLayout-1M is designed to support large-scale training for document layout generation. It covers six common document types from real-world scenarios:

Type	File	Volume
Textbook	`textbook.json`	200,000
Newspaper	`newspaper.json`	207,679
Magazine	`magazine.json`	195,008
Exam paper	`exam.json`	90,360
Academic paper	`academic.json`	200,000
Slide	`slide.json`	100,000
Total	-	993,047

The dataset is collected from 36 public and copyright-clean sources, including academic databases, publishers, and document-sharing platforms. It covers diverse domains such as academia, education, news, economics, and more.

Dataset Highlights

Large Scale: approximately 1M document pages and about 48M annotated layout elements.
Diverse Types: 6 challenging and complicated document types from 36 public and copyright-clean sources.
Reading Order: annotations follow a natural reading order, which is important for autoregressive layout generation.
Quality Assessment: blind human evaluation shows that more than 92% of sampled annotations have similar perceived quality to manual annotations.

🧠 OmniDocLayout-LLM

The core idea of OmniDocLayout-LLM is a two-stage Coarse-to-Fine learning paradigm.

Stage 1: Coarse-grained Learning

In the first stage, the model learns universal document layout principles from OmniDocLayout-1M with coarse-grained labels. This stage helps the model acquire transferable spatial priors, such as:

Alignment
Non-overlapping arrangement
Reading order
Spatial grouping
...

Stage 2: Fine-grained Adaptation

In the second stage, the model is adapted to a specific complex document domain with fine-grained labels. For example, a coarse category such as title can be mapped to fine-grained domain-specific categories such as:

Title
Headline
First-level title
Second-level title
...

This design allows the model to benefit from large-scale coarse-grained layout knowledge while requiring only limited fine-grained annotations for adaptation.

Our model supports five layout generation tasks:

Task	Description
U-Cond	Unconditional layout generation without external constraints.
C→S+P	Given element categories, predict sizes and positions.
C+S→P	Given element categories and sizes, predict positions.
Completion	Complete the remaining layout given a subset of existing elements.
Refinement	Recover a clean layout from perturbed geometric attributes.

📊 Performance

We compare several visual examples of various methods under U-Cond Task as follows. For general-purpose LLMs, we adopt the strongest 5-shot setting.

😄 Acknowledgement

We thank the developers of the following projects and tools:

MinerU for document parsing and layout annotation.
DocLayout-YOLO for dense document layout detection.
SWIFT for the model training and inference.

📜 Citation

If you find this project useful for your research, please consider giving us a star and citing our paper:

@inproceedings{kang2026omnidoclayout,
  title={OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning},
  author={Kang, Hengrui and Gu, Zhuangcheng and Zhao, Zhiyuan and Wen, Zichen and Wang, Bin and Li, Weijia and He, Conghui},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3208--3218},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets/images		assets/images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OmniDocLayout: Towards Diverse Document Layout Generation via
Coarse-to-Fine LLM Learning

📢 Latest News

📝 Overview

🏆 Contribution