[Feature] Adaptation and Optimization of DeepSeek-OCR2 Multimodal OCR Model

### Checklist

- [x] If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Motivation

Adapt and optimize the high-precision text-image recognition model **DeepSeek-OCR2**. The model supports multilingual text recognition, handwritten character recognition, and layout parsing of complex documents.
Based on the SGLang framework, we complete the adaptation of the full pipeline including image preprocessing, visual encoding and text decoding. It is compatible with the computing characteristics of Ascend NPU, optimizes the computing resource consumption of image inference, and ensures that both OCR recognition accuracy and inference speed meet operational requirements.

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Adaptation and Optimization of DeepSeek-OCR2 Multimodal OCR Model #536

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Adaptation and Optimization of DeepSeek-OCR2 Multimodal OCR Model #536

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions