Labelly - Image Captioning App (CNN + LSTM)

📌 Overview

Labelly is an image captioning application that generates natural language descriptions for images using a CNN–LSTM encoder–decoder architecture.
The project combines computer vision and natural language processing to demonstrate multimodal deep learning in an end-to-end pipeline.

The model is trained on the Flickr8k dataset and deployed using Streamlit for interactive inference.

🧠 Architecture

Image → Xception CNN → Image Feature Vector
Text → Tokenizer → Embedding → LSTM Decoder → Caption

Key Design Choices

Encoder: Xception (pretrained on ImageNet)
Decoder: LSTM
Training: Teacher forcing
Inference: Greedy decoding
Evaluation Metric: BLEU score

📂 Dataset

Dataset: Flickr8k
Images: 8,000
Captions: 5 per image (40,000 total)

The dataset is not included in this repository due to size and licensing constraints.

⚙️ Tech Stack

Python
TensorFlow / Keras
CNN (Xception)
LSTM
NLP (tokenization, sequence modeling)
Streamlit
NLTK (BLEU evaluation)

🔄 Workflow

Caption preprocessing (cleaning, tokenization, start/end tokens)
Image feature extraction using pretrained CNN
Sequence modeling using LSTM
Training with teacher forcing
Evaluation using BLEU score
Deployment via Streamlit app

📊 Results

BLEU-1 score: > 0.5 on Flickr8k
Generated captions are grammatically coherent and semantically aligned with image content

BLEU-1 was chosen due to dataset size and to avoid over-claiming performance.

🖥️ Streamlit App

The application allows users to:

Upload an image
Extract visual features using the trained CNN encoder
Generate captions using the trained LSTM decoder

This demonstrates how a deep learning model can be wrapped into a simple interactive application.

⚠️ Limitations

Trained on a relatively small dataset
Captions may be generic
No attention mechanism
Greedy decoding instead of beam search

This project is intended as a learning and demonstration system.

🚀 Future Improvements

Add attention mechanism
Use beam search decoding
Train on larger datasets (e.g., MS-COCO)
Improve caption diversity

🎯 One-Line Summary

Built a CNN–LSTM based image captioning system using Xception and LSTM, trained on Flickr8k, achieving a BLEU-1 score above 0.5 and deployed via Streamlit.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
features		features
models		models
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Labelly - Image Captioning App (CNN + LSTM)

📌 Overview

🧠 Architecture

Key Design Choices

📂 Dataset

⚙️ Tech Stack

🔄 Workflow

📊 Results

🖥️ Streamlit App

⚠️ Limitations

🚀 Future Improvements

🎯 One-Line Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Labelly - Image Captioning App (CNN + LSTM)

📌 Overview

🧠 Architecture

Key Design Choices

📂 Dataset

⚙️ Tech Stack

🔄 Workflow

📊 Results

🖥️ Streamlit App

⚠️ Limitations

🚀 Future Improvements

🎯 One-Line Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages