Labelly is an image captioning application that generates natural language descriptions for images using a CNN–LSTM encoder–decoder architecture.
The project combines computer vision and natural language processing to demonstrate multimodal deep learning in an end-to-end pipeline.
The model is trained on the Flickr8k dataset and deployed using Streamlit for interactive inference.
Image → Xception CNN → Image Feature Vector
Text → Tokenizer → Embedding → LSTM Decoder → Caption
- Encoder: Xception (pretrained on ImageNet)
- Decoder: LSTM
- Training: Teacher forcing
- Inference: Greedy decoding
- Evaluation Metric: BLEU score
- Dataset: Flickr8k
- Images: 8,000
- Captions: 5 per image (40,000 total)
The dataset is not included in this repository due to size and licensing constraints.
- Python
- TensorFlow / Keras
- CNN (Xception)
- LSTM
- NLP (tokenization, sequence modeling)
- Streamlit
- NLTK (BLEU evaluation)
- Caption preprocessing (cleaning, tokenization, start/end tokens)
- Image feature extraction using pretrained CNN
- Sequence modeling using LSTM
- Training with teacher forcing
- Evaluation using BLEU score
- Deployment via Streamlit app
- BLEU-1 score: > 0.5 on Flickr8k
- Generated captions are grammatically coherent and semantically aligned with image content
BLEU-1 was chosen due to dataset size and to avoid over-claiming performance.
The application allows users to:
- Upload an image
- Extract visual features using the trained CNN encoder
- Generate captions using the trained LSTM decoder
This demonstrates how a deep learning model can be wrapped into a simple interactive application.
- Trained on a relatively small dataset
- Captions may be generic
- No attention mechanism
- Greedy decoding instead of beam search
This project is intended as a learning and demonstration system.
- Add attention mechanism
- Use beam search decoding
- Train on larger datasets (e.g., MS-COCO)
- Improve caption diversity
Built a CNN–LSTM based image captioning system using Xception and LSTM, trained on Flickr8k, achieving a BLEU-1 score above 0.5 and deployed via Streamlit.