Problem
The index build pipeline logs per-stage progress but has no unified progress indicator or ETA. On a MacBook building a small index (~40 chunks), the embed stage takes ~4 min with no feedback other than periodic "Embedded 10/40" log lines. For larger builds (thousands of documents), there's no way to estimate remaining time.
Proposed fix
Add a tqdm progress bar (already a dependency) in pipelines.py that:
- Shows overall pipeline progress:
[Stage 3/4: Embedding] 24/40 chunks [02:16<01:30, 5.5s/chunk]
- Updates in real-time during the embed stage (currently the longest stage)
- Shows a brief summary at the end:
Done in 5m 23s (3 docs → 40 chunks → 40 vectors)
The embed subprocess already prints tqdm output — this would wrap the top-level orchestration.
Bonus
--quiet flag to suppress progress for CI/scripted use
--dry-run that shows what would happen without running (N docs, estimated chunks, estimated time)
Problem
The index build pipeline logs per-stage progress but has no unified progress indicator or ETA. On a MacBook building a small index (~40 chunks), the embed stage takes ~4 min with no feedback other than periodic "Embedded 10/40" log lines. For larger builds (thousands of documents), there's no way to estimate remaining time.
Proposed fix
Add a
tqdmprogress bar (already a dependency) inpipelines.pythat:[Stage 3/4: Embedding] 24/40 chunks [02:16<01:30, 5.5s/chunk]Done in 5m 23s (3 docs → 40 chunks → 40 vectors)The embed subprocess already prints tqdm output — this would wrap the top-level orchestration.
Bonus
--quietflag to suppress progress for CI/scripted use--dry-runthat shows what would happen without running (N docs, estimated chunks, estimated time)