Skip to content

Optimize fetching Recent workflows #993

@anish-work

Description

@anish-work

This will be slow on prod — measured 2.39s on the heaviest workspace, and it runs on every home-page load. I profiled it against prod (EXPLAIN ANALYZE, live DB).

The DISTINCT ON (parent_version__published_run_id) forces Postgres to read all of a workspace’s matching SavedRuns and disk-sort them just to emit 4 cards. bots_savedrun is 7.4M rows, and run history is heavily concentrated per workspace:

workspace runs (w/ parent PR) distinct published runs
5 1,780,569 1
232,791 231,343 3
3,206,451 69,119 1
3,764,345 53,981 3
3,466,675 52,471 10

Plan for workspace 5:

Limit (rows=4) actual time=2370..2383
  -> Sort (updated_at DESC)
    -> Unique  [DISTINCT ON published_run_id]
      -> Gather Merge rows=1780569
        -> Sort Key: published_run_id, updated_at DESC
              Sort Method: external merge  Disk: ~21MB
          -> Parallel Seq Scan on bots_savedrun  Filter:(workspace_id=5)  Rows Removed: 1775903
Execution Time: 2390 ms (+555ms JIT)

Crucially, no query reformulation fixes this: deriving “recent distinct workflows” from SavedRun history is inherently O(total runs), and the data above shows workspaces with millions of runs but only 1–3 distinct published runs (so a “walk -updated_at and dedup in Python” approach would scan the entire history and still never reach 4 distinct).

Proper fix: denormalize last-used. Maintain a small (workspace, published_run, last_used_at) row, upserted whenever a run is saved (and a backfill migration). Then recent workflows becomes:

WorkspaceRecentWorkflow.objects.filter(workspace=workspace)
    .order_by("-last_used_at")[:RECENT_WORKFLOW_LIST_LIMIT]

That reads 1–240 rows per workspace instead of up to 1.78M — O(distinct), exact, and index-friendly (Index(workspace, -last_used_at); add uid for the team-scoped path). Happy to share the EXPLAIN output if useful.

Originally posted by @devxpy in #992 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions