This guide covers setting up RunPod serverless GPUs for the toolkit's Cloud GPU tools.
| Tool | Backend | Use Case | Cost/Job |
|---|---|---|---|
image_edit |
Qwen-Image-Edit | AI image editing, style transfer | ~$0.01-0.02 |
upscale |
RealESRGAN | AI image upscaling (2x/4x) | ~$0.01-0.05 |
dewatermark |
ProPainter | AI video inpainting | ~$0.05-0.30 |
These tools use AI models that require significant GPU power:
| Hardware | Image Edit | Upscale | Dewatermark (30s) | Viable? |
|---|---|---|---|---|
| NVIDIA RTX 3090+ | ~20s | ~10s | 2-5 min | Yes |
| Apple Silicon | N/A | N/A | 4+ hours | No |
| CPU only | N/A | N/A | 10+ hours | No |
RunPod provides on-demand NVIDIA GPUs at ~$0.34/hour, making it cost-effective for occasional use.
The fastest way to set up RunPod:
# 1. Add your RunPod API key to .env
echo "RUNPOD_API_KEY=your_key_here" >> .env
# 2. Run automated setup for each tool you need
python tools/image_edit.py --setup # AI image editing
python tools/upscale.py --setup # AI upscaling
python tools/dewatermark.py --setup # AI watermark removal
# 3. Done! Now use them:
python tools/image_edit.py --input photo.jpg --style cyberpunk
python tools/upscale.py --input photo.jpg --output photo_4x.png --runpod
python tools/dewatermark.py --input video.mp4 --region x,y,w,h --output out.mp4 --runpodEach --setup command will:
- Create a serverless template using a pre-built Docker image
- Create an endpoint with appropriate GPU
- Save the endpoint ID to your
.envfile
All tools use pre-built public images (no building required):
| Tool | Image |
|---|---|
| image_edit | ghcr.io/conalmullan/video-toolkit-qwen-edit:latest |
| upscale | ghcr.io/conalmullan/video-toolkit-realesrgan:latest |
| dewatermark | ghcr.io/conalmullan/video-toolkit-propainter:latest |
Use --setup-gpu AMPERE_16 for RTX 3080 or --setup-gpu ADA_24 for RTX 4090.
If you prefer to set up manually via the web console:
- Go to runpod.io and sign up
- Add credits to your account ($10 minimum, lasts for many jobs)
- Go to Settings > API Keys and create an API key
Create one endpoint per tool you want to use:
| Tool | Docker Image | GPU | Timeout |
|---|---|---|---|
| image_edit | ghcr.io/conalmullan/video-toolkit-qwen-edit:latest |
48GB+ (A6000, L40S, A100) | 300s |
| upscale | ghcr.io/conalmullan/video-toolkit-realesrgan:latest |
24GB (RTX 3090/4090) | 300s |
| dewatermark | ghcr.io/conalmullan/video-toolkit-propainter:latest |
24GB (RTX 3090/4090) | 3600s |
Steps for each:
- Go to RunPod Serverless Console
- Click New Endpoint
- Configure:
| Setting | Value | Notes |
|---|---|---|
| Docker Image | (see table above) | Pre-built public image |
| GPU | (see table above) | VRAM requirements vary |
| Max Workers | 1 | Scale up for batch processing |
| Idle Timeout | 5 seconds | Fast scale-down to save costs |
| Execution Timeout | (see table above) | Video processing needs longer |
- Click Create Endpoint
- Copy the Endpoint ID (looks like:
abc123xyz)
Add to your .env file (only the endpoints you created):
# RunPod Configuration
RUNPOD_API_KEY=your_api_key_here
# Endpoint IDs (one per tool)
RUNPOD_QWEN_EDIT_ENDPOINT_ID=abc123 # For image_edit
RUNPOD_UPSCALE_ENDPOINT_ID=def456 # For upscale
RUNPOD_ENDPOINT_ID=ghi789 # For dewatermark# Image editing
python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses"
# Upscaling
python tools/upscale.py --input photo.jpg --output photo_4x.png --runpod
# Dewatermark (with dry run)
python tools/dewatermark.py \
--input video.mp4 \
--region 1080,660,195,40 \
--output clean.mp4 \
--runpod \
--dry-run1. Local tool uploads video to temporary storage
2. Submits job to RunPod endpoint
3. RunPod spins up GPU worker (~30s cold start)
4. Worker downloads video, runs ProPainter
5. Worker uploads result, returns URL
6. Local tool downloads result
7. Worker scales down (you stop paying)
| Video Length | Processing Time | Cost (RTX 3090) |
|---|---|---|
| < 30 seconds | 2-5 minutes | ~$0.02 |
| 30s - 2 min | 5-15 minutes | ~$0.08 |
| 2 - 5 min | 15-45 minutes | ~$0.25 |
| > 5 min | 45+ minutes | ~$0.40+ |
| GPU | VRAM | Cost/hr | Speed | Best For |
|---|---|---|---|---|
| RTX 3090 | 24GB | $0.34 | Fast | Most videos (recommended) |
| RTX 4090 | 24GB | $0.69 | Faster | Tight deadlines |
| A100 | 80GB | $1.99 | Fastest | Very long videos |
- Use 5-second idle timeout - Workers scale down quickly
- Process in batches - Submit multiple videos to same warm worker
- Right-size your GPU - RTX 3090 is plenty for most videos
- Set max workers = 1 initially - Prevents runaway costs
Add your API key to .env:
RUNPOD_API_KEY=your_key_hereAdd your endpoint ID to .env:
RUNPOD_ENDPOINT_ID=abc123xyzDefault timeout is 30 minutes. For longer videos:
python tools/dewatermark.py ... --runpod --runpod-timeout 3600- Check your internet connection
- Verify the video file exists and is readable
- Large files (>500MB) may take several minutes to upload
This is normal for the first request after idle. The worker needs to:
- Spin up the container
- Load PyTorch and models into GPU memory
Subsequent requests to a warm worker are faster.
Check the RunPod logs:
- Go to RunPod Console > Serverless > Your Endpoint > Logs
- Look for error messages from the handler
Common issues:
- Video format not supported (try converting to MP4)
- Region coordinates exceed video dimensions
- GPU ran out of memory (shouldn't happen with 24GB GPUs)
By default, videos are uploaded via free file hosting services (litterbox.catbox.moe, etc.). These work but can be unreliable for large files.
Cloudflare R2 provides reliable, fast file transfer with a generous free tier:
- 10 GB storage (we clean up after each job)
- 10 million operations/month
- Zero egress fees (unlike AWS S3)
- No expiration (unlike AWS's 12-month free tier)
-
Create Cloudflare Account (free): https://dash.cloudflare.com
-
Create R2 Bucket:
- Go to R2 Object Storage → Create bucket
- Name:
video-toolkit(or any name) - Click Create
-
Create API Token:
- R2 → Overview → Manage R2 API Tokens
- Create API Token → Object Read & Write
- Specify bucket:
video-toolkit - Copy the Access Key ID and Secret Access Key (shown once!)
-
Get Account ID:
- Visible in dashboard URL:
dash.cloudflare.com/<ACCOUNT_ID>/r2
- Visible in dashboard URL:
-
Add to .env:
R2_ACCOUNT_ID=your_account_id R2_ACCESS_KEY_ID=your_access_key_id R2_SECRET_ACCESS_KEY=your_secret_access_key R2_BUCKET_NAME=video-toolkit
-
Install boto3 (if not already):
pip install boto3
That's it! All RunPod tools will automatically use R2 for file transfer when configured.
R2 uses the S3-compatible API via AWS CLI. All commands require --region auto (R2 does not use AWS regions).
# List all objects in bucket
AWS_ACCESS_KEY_ID="$R2_ACCESS_KEY_ID" \
AWS_SECRET_ACCESS_KEY="$R2_SECRET_ACCESS_KEY" \
aws s3api list-objects-v2 \
--bucket "$R2_BUCKET_NAME" \
--endpoint-url "https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com" \
--region auto
# List objects under a specific prefix (e.g. flux2 results)
aws s3 ls "s3://$R2_BUCKET_NAME/flux2/" \
--endpoint-url "https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com" \
--region auto
# Delete a specific object
aws s3 rm "s3://$R2_BUCKET_NAME/sadtalker/results/old-file.mp4" \
--endpoint-url "https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com" \
--region auto
# Delete all objects under a prefix
aws s3 rm "s3://$R2_BUCKET_NAME/flux2/results/" --recursive \
--endpoint-url "https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com" \
--region autoCommon mistake: Omitting
--region autocauses anInvalidRegionNameerror. R2 valid regions are:wnam,enam,weur,eeur,apac,oc,auto.
Result files accumulate over time. Tools do not auto-delete after download. To check usage:
# Quick size check per folder
aws s3 ls "s3://$R2_BUCKET_NAME/" --recursive --summarize \
--endpoint-url "https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com" \
--region autoObjects are organized by tool: sadtalker/results/, qwen3-tts/results/, flux2/results/, upscale/results/, etc. Safe to delete old results anytime — they're copies of files already downloaded locally.
If R2 is not configured, the tool falls back to free file hosting services:
litterbox.catbox.moe(200MB, 24h retention)file.io(2GB, 1 download)transfer.sh(10GB, 14 days) - often down0x0.st(512MB, 30 days) - blocks many requests
These work for testing but may fail intermittently for production use.
You can create multiple endpoints for different use cases:
# .env
RUNPOD_ENDPOINT_ID=abc123xyz # Default (RTX 3090)
RUNPOD_ENDPOINT_ID_FAST=def456uvw # Fast (RTX 4090)- Go to RunPod Console > Usage
- View spend by endpoint, GPU type, and time period
- Set up billing alerts to avoid surprises
- API keys grant full access to your RunPod account - keep them secret
- R2 credentials are passed to RunPod workers for result upload - ensure your bucket is private
- Without R2, videos go through public file hosting services (not recommended for sensitive content)
- R2 objects are automatically cleaned up after download
- Presigned URLs expire after 2 hours
The toolkit is designed for extensibility. Future tools may include:
- video_gen - AI video generation (Wan I2V, in development)
- animate - Character animation
- denoise - Audio/video denoising
- stabilize - Video stabilization
Working:
- ✅ AI image editing (Qwen-Image-Edit) - style transfer, backgrounds, custom prompts
- ✅ AI upscaling (RealESRGAN) - 2x/4x with face enhancement
- ✅ AI watermark removal (ProPainter) - video inpainting
- ✅ Cloudflare R2 file transfer (reliable, fast)
- ✅ Automatic GPU detection
Current Limitations:
| Tool | Issue | Workaround |
|---|---|---|
| image_edit | Requires 48GB+ VRAM | Use A6000, L40S, or A100 GPUs |
| image_edit | Background replacement inconsistent | Use style/prompt instead |
| dewatermark | Long videos untested | Chunk videos client-side |
| dewatermark | Full resolution OOM on 24GB | Use auto resize_ratio |
| Version | Date | Changes |
|---|---|---|
| v2.0.0 | 2025-12-30 | Major fix: GPU detection (removed CUDA_VISIBLE_DEVICES), CUDA 12.4, auto resize_ratio |
| v1.2.1 | 2025-12-30 | Fix output file detection (inpaint_out.mp4 not masked_in.mp4) |
| v1.2.0 | 2025-12-30 | Fix GPU detection (use max across all GPUs), improve memory profiles |
| v1.1.0 | 2025-12-30 | Add resize_ratio parameter, improve error logging |
| v1.0.0 | 2025-12-30 | Initial R2 integration, NumPy 1.x fix |