Skip to content

[Feature] Adaptation and Optimization of DeepSeek-OCR2 Multimodal OCR Model #536

@loading66

Description

@loading66

Checklist

Motivation

Adapt and optimize the high-precision text-image recognition model DeepSeek-OCR2. The model supports multilingual text recognition, handwritten character recognition, and layout parsing of complex documents.
Based on the SGLang framework, we complete the adaptation of the full pipeline including image preprocessing, visual encoding and text decoding. It is compatible with the computing characteristics of Ascend NPU, optimizes the computing resource consumption of image inference, and ensures that both OCR recognition accuracy and inference speed meet operational requirements.

Related resources

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions