Skip to content

Kadermiyanyedi/ocr-supported-react-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building an OCR-Powered ReAct Agent from Scratch with LangGraph and Gemini

A blog post demo project demonstrating how to build a multimodal ReAct agent from scratch using LangGraph and Gemini. This agent can analyze invoice and receipt images and save the data to Excel.

Features

  • Multimodal OCR: Invoice/receipt analysis with Gemini's vision capabilities
  • ReAct Pattern: Reasoning + acting loop with LangGraph
  • Excel Integration: Save and query analyzed invoices
  • CLI Interface: User-friendly command line experience with Typer

Installation

# Install dependencies
uv sync

# Copy the environment template and add your API key
cp .env.template .env
# Edit .env and add your Gemini API key

Usage

Analyze Invoice

# Analyze an invoice
uv run python main.py analyze invoice.png

# Analyze and save to Excel
uv run python main.py analyze invoice.png --save

List Saved Invoices

uv run python main.py list

Ask Questions About Invoices

uv run python main.py ask "Which invoice has the highest total?"

Interactive Mode

uv run python main.py chat

Project Structure

├── main.py              # CLI entry point
├── src/
│   └── agent/
│       ├── graph.py     # LangGraph agent definition
│       ├── nodes.py     # Agent nodes
│       ├── state.py     # Agent state definition
│       ├── tools.py     # Excel tools
│       └── prompts.py   # System prompt
└── invoices.xlsx        # Saved invoices

Supported Image Formats

  • PNG
  • JPEG/JPG
  • GIF
  • WebP

Requirements

  • Python 3.12+
  • Google Gemini API key

About

OCR-supported ReAct agent built with LangGraph and Gemini. Extract data from images, automate workflows, and integrate tools seamlessly.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages