A small but handy tool that pulls text out of images and reads it aloud. Upload a picture, let Azure do the heavy lifting (OCR + speech synthesis), and get both the extracted text and a downloadable audio file — all from a single browser tab.
Built with Streamlit on the frontend and Azure Cognitive Services on the backend.
- You upload an image (screenshot, photo of a document, scanned page, whatever).
- Azure's Computer Vision API reads every line of text in that image.
- Azure's Speech Service turns that text into natural-sounding speech.
- You get the extracted text on screen, word/character counts, and a
.wavfile you can play or download.
That's it. No accounts to create inside the app, no complicated steps.
| Layer | Tech |
|---|---|
| UI | Streamlit (Python) |
| OCR | Azure Computer Vision (Read API) |
| Speech | Azure Speech Service (Text-to-Speech) |
| Config | .env file via python-dotenv |
- Python 3.9+ installed on your machine.
- An Azure account with two resources provisioned:
- Computer Vision — for the OCR part.
- Speech Service — for text-to-speech.
git clone https://github.com/prathamtagad/Readify-AI.git
cd Readify-AIpip install -r requirements.txtCreate a .env file in the project root (there's already a template in the repo). Fill in your own credentials:
VISION_KEY=your_vision_api_key_here
VISION_ENDPOINT=https://your-resource-name.cognitiveservices.azure.com/
SPEECH_KEY=your_speech_api_key_here
SPEECH_REGION=eastus
Heads up: Never commit real API keys to a public repo. The
.envfile should be in your.gitignore.
streamlit run Abishek_sir_project_azure_ocr_to_speech.pyStreamlit will open a new browser tab at http://localhost:8501. That's your app.
- Click the upload area and pick an image (PNG, JPG, BMP, TIFF, or WebP).
- Hit "✨ Extract Text & Generate Speech".
- Wait a few seconds — the spinner will show progress.
- Once done, the left panel shows the extracted text with stats (word count, character count), and the right panel has the audio player plus a download button.
- The audio plays automatically in your browser — no extra clicks needed.
For best results, use images where the text is clearly visible. High-contrast, well-lit photos work best. Blurry or heavily stylized text might not come through perfectly.
.
├── Abishek_sir_project_azure_ocr_to_speech.py # Main app — everything lives here
├── .env # Your Azure API keys (not committed)
└── README.md # You're reading this
It's a single-file project on purpose. Keeps things simple and easy to hand off or deploy.
- PNG
- JPEG / JPG
- BMP
- TIFF
- WebP
"Missing Azure credentials" error on launch?
→ Double-check your .env file. Make sure the variable names match exactly: VISION_KEY, VISION_ENDPOINT, SPEECH_KEY, SPEECH_REGION.
OCR returns empty text? → The image might be too blurry, too small, or the text might be in a language/script Azure doesn't handle well. Try a cleaner scan.
Speech synthesis fails or gets canceled?
→ Usually a quota or region issue on the Azure side. Verify your Speech resource is active and the region in .env matches the one you created the resource in.
The easiest way to deploy Readify AI is using Streamlit Community Cloud:
- Push your code to a GitHub repository.
- Go to share.streamlit.io and connect your account.
- Click "New app" and select your repo/branch.
- Important: Before clicking Deploy, go to Advanced Settings -> Secrets and paste your Azure credentials there:
VISION_KEY = "your_key" VISION_ENDPOINT = "your_endpoint" SPEECH_KEY = "your_key" SPEECH_REGION = "your_region"
- Click Deploy!
This project doesn't currently ship with a license file. If you're planning to share it, consider adding an MIT or Apache 2.0 license.
Built by Pratham Tagad — GitHub · Portfolio · prathamtagad0@gmail.com