Skip to content

[Enhancement] Add overall progress bar to pixelrag index build #99

Description

@aafaq-rashid-comprinno

Problem

The index build pipeline logs per-stage progress but has no unified progress indicator or ETA. On a MacBook building a small index (~40 chunks), the embed stage takes ~4 min with no feedback other than periodic "Embedded 10/40" log lines. For larger builds (thousands of documents), there's no way to estimate remaining time.

Proposed fix

Add a tqdm progress bar (already a dependency) in pipelines.py that:

  1. Shows overall pipeline progress: [Stage 3/4: Embedding] 24/40 chunks [02:16<01:30, 5.5s/chunk]
  2. Updates in real-time during the embed stage (currently the longest stage)
  3. Shows a brief summary at the end: Done in 5m 23s (3 docs → 40 chunks → 40 vectors)

The embed subprocess already prints tqdm output — this would wrap the top-level orchestration.

Bonus

  • --quiet flag to suppress progress for CI/scripted use
  • --dry-run that shows what would happen without running (N docs, estimated chunks, estimated time)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions