Skip to content

shanthanu47/Image-generation-council

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Generation Council

The Image Generation Council is an advanced AI agentic workflow system designed to optimize the creation of AI-generated images. Inspired by the "LLM Council" architecture, this project orchestrates multiple specialized AI agents to collaborate, critique, and refine image generation prompts, simulating a professional creative studio environment.

Overview

The system operates as a sequential pipeline of "Agents" or "Stages," where specialized models perform distinct roles to ensure high-quality output. It leverages Large Language Models (LLMs) for reasoning and creativity, and Vision Language Models (VLMs) for visual feedback.

The Council Workflow

The workflow consists of four distinct stages:

  1. Stage 1: Prompt Engineering (The Text Council)

    • Role: Expert Prompt Engineers.
    • Function: Several LLMs receive the user's high-level concept and independently draft detailed, chemically-optimized prompts (focusing on lighting, style, composition, and mood).
    • Goal: To translate a vague user request into a precise technical specification.
  2. Stage 2: Image Generation (The Artists)

    • Role: Digital Artists.
    • Function: State-of-the-art image generation models (e.g., Stable Diffusion XL, DALL-E 3) interpret the prompts from Stage 1 to render visual candidates.
    • Goal: To visualize the refined concepts.
  3. Stage 3: Vision Critique (The Critics)

    • Role: Art Critics and Quality Assurance.
    • Function: Vision-enabled models analyze the generated images against the original user request. They evaluate adherence to the prompt, visual fidelity, artifacts, and aesthetic quality.
    • Goal: To provide objective, visual feedback on the results.
  4. Stage 4: Protocol Synthesis (The Chairman)

    • Role: The Chairman / Creative Director.
    • Function: A highly capable LLM reviews the entire history: original request, generated images, and critic reviews. It selects the winning candidate, explains the decision, and offers a final recommendation.
    • Goal: To provide the user with the best result and actionable insight.

Architecture

The project is built using a modern full-stack architecture:

  • Backend: Python 3.12+, FastAPI (Async Web Framework).
  • Frontend: React 19, Vite (Modern Frontend Build Tool).
  • AI Inference:
    • Primary: Local LLM Support via LM Studio (OpenAPI compatible).
    • Secondary: Cloud Support via OpenRouter (Configurable).

Prerequisites

  • Python 3.12+ (with uv or pip for package management).
  • Node.js 18+ (for frontend).
  • LM Studio (for local offline inference).

Installation

  1. Clone the Repository

    git clone https://github.com/yourusername/image-generation-council.git
    cd image-generation-council
  2. Backend Setup Navigate to the root directory and install Python dependencies.

    # Using uv (Recommended)
    uv sync
    
    # Or using standard pip
    pip install -r requirements.txt
  3. Frontend Setup Navigate to the frontend directory.

    cd frontend
    npm install

Local LLM Configuration (LM Studio)

By default, the application is configured to run locally using LM Studio to ensure privacy and eliminate API costs.

  1. Install LM Studio: Download and install from the official website.
  2. Download a Model: Inside LM Studio, search for and download a model (Recommended: Microsoft Phi-3 Mini or Llama 3 8B).
  3. Load the Model: Go to the "Local Server" tab (< > icon) and select the loaded model.
  4. Start Server:
    • Set Port to 1234.
    • Enable CORS (Cross-Origin Resource Sharing).
    • Click Start Server.

Usage

  1. Start the Application We provide a unified launch script for convenience.

    # Windows
    run_app.bat

    Alternatively, run services manually:

    • Backend: uv run python -m backend.main (Port 8002)
    • Frontend: npm run dev (Port 5173)
  2. Access the Interface Open your browser to http://localhost:5173.

  3. Create a Council

    • Click New Conversation.
    • Enter your image concept (e.g., "A cyberpunk robot drinking wine").
    • Watch as the Council deliberates, generates, and critiques your request in real-time.

Customization

You can customize the models used by modifying backend/config.py:

  • CHAIRMAN_MODEL: The model used for final synthesis.
  • TEXT_COUNCIL_MODELS: List of models for prompt engineering.
  • VISION_CRITIC_MODELS: List of vision-capable models for critique.
  • OPENROUTER_API_URL: Point to https://openrouter.ai/api/v1 to use cloud models instead of localhost.

Troubleshooting

  • "Failed to allocate buffer for kv cache": Your local model is too large for your RAM. In LM Studio, lower the "Context Length" to 2048 or use a smaller model like Phi-3 Mini.
  • Connection Error: Ensure LM Studio server is running on port 1234.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors