Why
The story upload endpoint (_try_minio_upload in answer/api/router.py) does real work inside the request: decode original, generate 3 responsive WEBPs + blur + dominant colour, push 4 MinIO objects. Threaded but still on the request path.
For a 3-5 MB iPhone original this is plausibly 2-5s user-visible latency. The roadmap #274 proposed moving this to Celery; that's premature at current scale, but we can't tell when it stops being premature without measuring.
Scope
- Log per-stage timings (decode, each variant encode+put, blur, dominant colour, total) under a structured log line — e.g.
upload.timing with fields user_id, bytes_in, bytes_out_lg/md/sm, stage durations, total
- Surface a simple aggregate query target — easiest: write to a
UploadLatencyEvent row (or piggyback on an existing audit table) so we can look at p50/p95/p99 without a logs pipeline
- After 2-4 weeks of data, decide: stay sync, push to background thread, or commit to a queue
Threshold to revisit (rough)
Move to async when any of:
- p99 upload duration crosses ~3s consistently
- Gunicorn workers visibly back up during upload bursts (worker pool saturation)
- A 4th+ variant gets added (AVIF, larger lg, etc.)
Split from #274
Why
The story upload endpoint (
_try_minio_uploadinanswer/api/router.py) does real work inside the request: decode original, generate 3 responsive WEBPs + blur + dominant colour, push 4 MinIO objects. Threaded but still on the request path.For a 3-5 MB iPhone original this is plausibly 2-5s user-visible latency. The roadmap #274 proposed moving this to Celery; that's premature at current scale, but we can't tell when it stops being premature without measuring.
Scope
upload.timingwith fieldsuser_id,bytes_in,bytes_out_lg/md/sm, stage durations, totalUploadLatencyEventrow (or piggyback on an existing audit table) so we can look at p50/p95/p99 without a logs pipelineThreshold to revisit (rough)
Move to async when any of:
Split from #274