Skip to content

sahurai/ReAct-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReAct Agent API

Overview

This project implements an advanced ReAct (Reason + Act) agent using FastAPI and LangChain. Unlike standard conversational bots, this agent employs a sophisticated middleware architecture that dynamically manages context summarization, handles model fallbacks, and orchestrates task lists to ensure robust performance.

The system is designed as a stateful, memory-aware service during runtime, but operates on an ephemeral basis. All conversation history and task states are stored exclusively in RAM (via MemorySaver) and are permanently purged upon server restart, ensuring strict data privacy and a clean state for development.

Workflow Architecture

ReAct Agent Architecture

The request processing pipeline consists of the following layers:

  1. API Entry Point: Receives the user query and session metadata via FastAPI.
  2. Middleware Layer:
    • Summarization: Analyzes token usage. If the history exceeds 4000 tokens, it compresses the context using a lightweight model (gpt-oss-20b).
    • Todo List: Extracts and manages sub-tasks for complex queries.
  3. Reasoning Node: The primary LLM (llama-3.3-70b) analyzes the context and decides whether to act or answer.
  4. Tool Execution Node: Executes external actions (Calculator or Web Search) if requested by the reasoning node.
  5. Retry & Fallback Layer:
    • Tool Retry: Automatically retries failed tool calls up to 3 times with backoff.
    • Model Fallback: Switches to a larger model (gpt-oss-120b) if the primary model fails.
  6. Response Formatting: Synthesizes the final answer and usage metadata into a structured JSON.

Core Features

1. Middleware Orchestration

The agent utilizes a chain of middleware components to enhance reliability and context management:

  • Summarization Middleware: Automatically prevents context window overflow by summarizing conversation history once it passes a threshold (4000 tokens), retaining only the last 20 messages verbatim.
  • Model Fallback Middleware: Provides high availability by seamlessly switching to a backup model (openai/gpt-oss-120b) if the primary inference engine encounters errors.
  • Todo List Middleware: Maintains an internal state of pending tasks, allowing the agent to break down complex user requests into manageable steps.

2. Tool Integration & Retry

To bridge the gap between language modeling and factual accuracy, the system integrates specific tools:

  • Tavily Search: Used for retrieving real-time information, news, and facts from the web.
  • Numexpr Calculator: Used for precise mathematical evaluations, eliminating LLM arithmetic hallucinations.
  • Auto-Retry: If a tool fails (e.g., API timeout or syntax error), the ToolRetryMiddleware intercepts the error and retries the operation with exponential backoff.

3. Structured Communication

Unlike simple text streams, the API enforces strict data contracts:

  • Input: Requires a query and a thread_id for session continuity.
  • Output: Returns a ResponseFormat object containing the final answer and a list of specific tools used during the generation process.

Technical Stack

  • API Framework: FastAPI
  • Orchestration: LangChain, LangGraph
  • LLM Inference: Groq Cloud (Llama 3.3 70B, GPT-OSS variants)
  • External Search: Tavily AI Search
  • Math Engine: Numexpr
  • Validation: Pydantic

Installation and Setup

Prerequisites

  • Python 3.10 or higher
  • API Keys for Google, Groq, and Tavily.

1. Clone the Repository

git clone https://github.com/your-username/react-agent-api.git
cd react-agent-api

2. Create Virtual Environment

python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

3. Install Dependencies

pip install -r requirements.txt

4. Environment Configuration

Create a .env file in the project root with the following variables:

GOOGLE_API_KEY=your_google_api_key_here
GROQ_API_KEY=your_groq_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here

# Optional: LangSmith Tracing (useful for debugging the graph)
LANGCHAIN_TRACING_V2=true
LANGSMITH_ENDPOINT=https://eu.api.smith.langchain.com
LANGCHAIN_API_KEY=your_langchain_api_key_here
LANGCHAIN_PROJECT="ReAct-Agent"

5. Run the Server

python main.py

The API will start at http://127.0.0.1:8000. Interactive API documentation (Swagger UI) is available at http://127.0.0.1:8000/docs.


API Documentation

GET /

Simple health check endpoint to verify the service status.

  • Response: JSON with status, service name, and docs link.

POST /api/chat

The primary interface for interaction. Triggers the Agent workflow with middleware support.

  • JSON Body:

  • query: The user's question.

  • thread_id: A unique string identifier for the user session (e.g., "session_001").

  • Response:

  • response: The generated text answer.

  • tool_usage: A list of strings indicating which tools were utilized (e.g., ['tavily_search', 'calculator']).


Configuration: Changing LLM Models

The project is currently configured to use Groq for high-speed inference. It uses different models for Chat, Summarization, and Fallback scenarios.

Where to Modify

Model definitions are located in app/services/agent.py.

How to Switch Models (Groq)

To change the specific Llama or GPT-OSS version, update the model parameter in the get_agent function:

# app/services/agent.py

# 1. Main Chat Model
chat_llm = ChatGroq(
    model="llama-3.3-70b-versatile", # Change to desired model
    temperature=0,
    api_key=settings.GROQ_API_KEY
)

# 2. Summarization Model
summarization_llm = ChatGroq(
    model="mixtral-8x7b-32768", # Example of changing model
    temperature=0,
    api_key=settings.GROQ_API_KEY
)

How to Switch Providers (e.g., to OpenAI)

Since the project uses LangChain, switching providers requires minimal code changes.

  1. Install the provider package:
pip install langchain-openai
  1. Update imports and initialization in app/services/agent.py:
from langchain_openai import ChatOpenAI

# Replace ChatGroq with ChatOpenAI
chat_llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    api_key="your_openai_key"
)

Note: Ensure you update the .env file with the necessary API keys for the new provider.


Data Privacy and Ephemeral Storage

This application operates in Ephemeral Mode.

  • In-Memory MemorySaver: The agent's memory (checkpoints) is initialized using MemorySaver(). Conversations exist only in RAM.
  • Server Restart: Upon terminating the process or restarting the server (uvicorn), all conversation history, todo lists, and session data are permanently erased. This ensures no sensitive data persists on the disk.

About

This project implements an advanced ReAct (Reason + Act) agent using FastAPI and LangChain. Unlike standard conversational bots, this agent employs a sophisticated middleware architecture that dynamically manages context summarization, handles model fallbacks, and orchestrates task lists to ensure robust performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages