Building an OCR-Powered ReAct Agent from Scratch with LangGraph and Gemini

A blog post demo project demonstrating how to build a multimodal ReAct agent from scratch using LangGraph and Gemini. This agent can analyze invoice and receipt images and save the data to Excel.

Features

Multimodal OCR: Invoice/receipt analysis with Gemini's vision capabilities
ReAct Pattern: Reasoning + acting loop with LangGraph
Excel Integration: Save and query analyzed invoices
CLI Interface: User-friendly command line experience with Typer

Installation

# Install dependencies
uv sync

# Copy the environment template and add your API key
cp .env.template .env
# Edit .env and add your Gemini API key

Usage

Analyze Invoice

# Analyze an invoice
uv run python main.py analyze invoice.png

# Analyze and save to Excel
uv run python main.py analyze invoice.png --save

List Saved Invoices

uv run python main.py list

Ask Questions About Invoices

uv run python main.py ask "Which invoice has the highest total?"

Interactive Mode

uv run python main.py chat

Project Structure

├── main.py              # CLI entry point
├── src/
│   └── agent/
│       ├── graph.py     # LangGraph agent definition
│       ├── nodes.py     # Agent nodes
│       ├── state.py     # Agent state definition
│       ├── tools.py     # Excel tools
│       └── prompts.py   # System prompt
└── invoices.xlsx        # Saved invoices

Supported Image Formats

PNG
JPEG/JPG
GIF
WebP

Requirements

Python 3.12+
Google Gemini API key

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.env.template		.env.template
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
invoice.png		invoice.png
invoices.xlsx		invoices.xlsx
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building an OCR-Powered ReAct Agent from Scratch with LangGraph and Gemini

Features

Installation

Usage

Analyze Invoice

List Saved Invoices

Ask Questions About Invoices

Interactive Mode

Project Structure

Supported Image Formats

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Building an OCR-Powered ReAct Agent from Scratch with LangGraph and Gemini

Features

Installation

Usage

Analyze Invoice

List Saved Invoices

Ask Questions About Invoices

Interactive Mode

Project Structure

Supported Image Formats

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages