Skip to content

akiselevprivate/AutoCaption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoCaption

AutoCaption is a simple script that generates captions for images using the llava-v1.5-13b vision model.

Installation

To get started, clone the repository and install the necessary dependencies:

git clone --recurse-submodules -j8 https://github.com/akiselevprivate/AutoCaption.git
cd AutoCaption
pip install --upgrade pip  # Enable PEP 660 support
pip install -e LLaVA/      # Install LLaVA module
pip install -r requirements.txt  # Install other dependencies

huggingface-cli login # login using key for weights download, skip if env variable set

Usage

Once you have everything installed, you can use the script to generate captions for all images in a specified folder.

Basic Command

python main.py <image_folder> --prefix "<prefix>" --suffix "<suffix>" --encoder_prompt "<encoder_prompt>"
  • <image_folder>: Path to the folder containing the images you want to caption.
  • <prefix>: Optional prefix that will be added to the caption (default is empty).
  • <suffix>: Optional suffix that will be added to the caption (default is empty).
  • <encoder_prompt>: Optional prompt addition for the encode model (default is empty).

Example

python main.py images --prefix "Photo of [trigger], " --encoder_prompt "for a t5 text encoder"

This will generate captions for all images in the images/ folder, and each caption will start with "Photo of [trigger], " followed by the description of the image generated by the model.

How It Works:

  1. Image Folder: The script reads the images from the folder specified.
  2. Captioning: The script uses a pre-trained model (llava-v1.5-13b) to generate captions.
  3. Prefix/Suffix: You can customize the captions with a prefix and/or suffix.

License

This project is licensed under the MIT License.

About

AutoCaption is a simple script that generates captions for images using the llava-v1.5-13b vision model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages