Skip to content

arungupta1526/ocr-tool

Repository files navigation

πŸ“„ Smart OCR Tool

A modern, fast, and powerful web-based OCR (Optical Character Recognition) tool that allows you to extract text from images and PDF documents instantly. Supports 12 languages including English, Hindi, and Arabic. Built with a focus on ease of use, speed, and a premium user experience.

GitHub last commit GitHub repo size License

🌐 Live Demo: arungupta1526.github.io/ocr-tool/

πŸ“š Table of Contents

πŸ“Έ Screenshots

Upload Interface

Users can drag and drop images or PDF files directly into the upload area.

Upload UI

OCR Processing

Real-time progress tracking for each file being processed.

Processing

Extracted Text Results

View, copy, or download the extracted text after processing.

Results

✨ Features

  • πŸ–ΌοΈ Image OCR: Extract text from PNG, JPG, JPEG, and WebP images.
  • πŸ“„ PDF Support: Full support for multi-page PDF documents. Each page is processed individually.
  • 🌍 Multi-Language Support: Run OCR in 12 different languages (English, Hindi, Arabic, French, German, Chinese, etc.).
  • πŸ“° Multi-Column Layouts: Perfectly extract text from 2-sided or 3-column PDFs/images by preserving reading order.
  • πŸš€ Real-time Progress: Track the OCR progress for each file with visual progress bars.
  • β›” Cancel Processing: Cancel any individual file's OCR mid-way without stopping others.
  • πŸ’Ύ Download as Text: Download the extracted text as a .txt file for easy editing and sharing.
  • πŸ“‹ Instant Copy: Copy extracted text to your clipboard with a single click (includes "Copied!" feedback).
  • ✨ Modern UI: A clean, responsive interface with smooth animations and dark mode support.
  • πŸ› οΈ Privacy First: All processing happens locally in your browser using WebAssembly. Your files are never uploaded to a server.

πŸ— Architecture

Smart OCR Tool runs entirely in the browser with no backend.

Architecture Diagram

Browser UI (React)
        ↓
PDF.js β†’ Canvas
        ↓
Tesseract.js (WASM OCR)
        ↓
Extracted Text

Processing Flow

User Upload β†’ Select Language & Layout β†’ File Queue β†’ OCR Engine β†’ Extracted Text

  1. User uploads images or PDFs.
  2. User selects desired OCR language (e.g., English, Hindi) and Column Layout (1, 2, or 3 columns).
  3. Files enter a processing queue.
  4. If a file is a PDF:
    • PDF.js renders pages to a canvas.
  5. Canvas images are passed to Tesseract.js.
  6. Tesseract performs OCR using WebAssembly.
  7. Extracted text is displayed in the results panel.
User File
   ↓
Upload Queue
   ↓
PDF.js Rendering
   ↓
Canvas Image
   ↓
Tesseract.js OCR
   ↓
Extracted Text

⚑ Performance Considerations

Per-File Abort Control

Each OCR job uses an AbortController so individual files can be cancelled without stopping the entire queue.

High Resolution Rendering

PDF pages are rendered at 2Γ— scale before OCR to improve recognition accuracy.

Memory Management

Canvas bitmaps are released after processing each page to avoid memory leaks when processing large PDFs.

Local Processing

All OCR runs locally using WebAssembly, eliminating network latency and ensuring full privacy.

πŸš€ Tech Stack

πŸ› οΈ Installation & Setup

  1. Clone the repository:

    git clone https://github.com/arungupta1526/ocr-tool.git
    cd ocr-tool
  2. Install dependencies:

    npm install
  3. Run the development server:

    npm run dev
  4. Build for production:

    npm run build

πŸ“– How to Use

  1. Upload: Drag and drop your images or PDF files into the upload area, or click to browse.
  2. Language & Layout: Select your language and the document's column layout (1, 2, or 3 columns) before processing.
  3. Process: Once your files are in the queue, click the "Start OCR" button.
  4. Cancel (optional): Click the βœ• button next to any file to cancel its processing individually.
  5. Review: Switch to the "Results" tab to view the extracted text for each file.
  6. Copy or Download: Click "Copy" to copy text to clipboard, or "Text" to download as a .txt file.

🎯 Use Cases

  • Extract text from scanned documents
  • Convert image-based PDFs to editable text
  • Quickly copy text from screenshots
  • OCR for research papers or notes

🀝 Contributing

Contributions are welcome! If you have ideas for improvements or new features, feel free to open an issue or submit a pull request.

🐳 Docker

Want to self-host or run this in a container? See the Docker Guide for full instructions including Dockerfile, build, run, and Docker Compose setup.

πŸ“œ License

Copyright (c) 2026 Arun Gupta

Distributed under the MIT License. See LICENSE for more information.

πŸ“ž Commercial Support

If your company needs custom OCR features, integrations, accuracy improvements or enterprise use: Contact: arungupta1526@gmail.com or LinkedIn


Made with ❀️ by Arun Gupta

About

A modern, fast, and powerful web-based OCR (Optical Character Recognition) tool that allows you to extract text from images and PDFs instantly, directly in your browser. Supports 12 languages including Hindi, with zero uploads and full privacy. Built with React, Tesseract.js, and PDF.js.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors