BE-580: Inline type filters onto the edition cache; rework summarizeEntities aggregation#8903
BE-580: Inline type filters onto the edition cache; rework summarizeEntities aggregation#8903TimDiekmann wants to merge 7 commits into
Conversation
Replace the per-row `CROSS JOIN LATERAL unnest` in the type-ids branch with a set-returning `unnest` in the SELECT list (ProjectSet), avoiding the single-threaded nested-loop function scan that dominated runtime (~23s -> ~8s warm on the production dataset). Skip the `DISTINCT ON` edition dedup in the `hits` CTE when the compiled query cannot emit duplicate rows per edition: no fan-out (to-many) filter join was added and the variable temporal axis is a collapsed point. Dropping the sort barrier lets the planner run a fully parallel partial aggregate (~8s -> ~4s warm). `Relation::is_to_many` + `SelectCompiler::has_to_many_join` expose the join-cardinality signal; the wrapper falls back to dedup whenever duplicates are possible, so the default stays correct for arbitrary filters.
Remove `generateVersionedUrlMatchingFilter` and inline every call site as a direct filter: `["type", "baseUrl"]` against `*.entityTypeBaseUrl` (or `*.linkEntityTypeBaseUrl` for link types), `["type", "versionedUrl"]` where a raw `VersionedUrl` is supplied or the exact version matters (e.g. migrations, integration dedup). Drop the `inheritanceDepth = 0` qualifier: a bare `type` path resolves to the materialized `entity_edition_cache` `base_urls`/`versioned_urls` columns (GIN-indexed over the full inheritance closure) instead of the slow `entity_is_of_type` join. This moves these filters onto the cache fast path — the same motivation as the summarizeEntities work — and obsoletes the `pageEntityTypeFilter` "inheritance is slow" workaround. Matching now includes subtypes (via the closure). For the affected system and integration types this is flat today, and identity-critical lookups (Linear sync dedup, `getOrgByShortname`) discriminate on a unique property, so the type filter is only a coarse prefilter. Also migrates the remaining hand-written `inheritanceDepth = 0` filters (`machine-actors`, `org`) and the `store.rs` doc example to the same convention.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
PR SummaryMedium Risk Overview In Reviewed by Cursor Bugbot for commit d024f51. Bugbot is set up for automated code reviews on this repo. Configure here. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8903 +/- ##
==========================================
+ Coverage 59.76% 59.79% +0.03%
==========================================
Files 1348 1348
Lines 131817 131885 +68
Branches 5944 5940 -4
==========================================
+ Hits 78784 78867 +83
+ Misses 52125 52111 -14
+ Partials 908 907 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes how entity queries are compiled/executed in the Postgres store by (1) reducing work in summarizeEntities (skipping unnecessary deduplication and avoiding lateral unnest expansion) and (2) inlining type-matching filters across the TypeScript codebase so they target the materialized entity_edition_cache type columns (base URLs / versioned URLs) rather than the entity_is_of_type join.
Changes:
- Reworked
summarizeEntitiesaggregation SQL generation to avoidCROSS JOIN LATERAL unnest(...)and to conditionally skipDISTINCT ONbased on a new deduplication decision. - Introduced
Relation::is_to_manyandSelectCompiler::has_to_many_join()to detect fan-out joins and drive the deduplication decision. - Removed
generateVersionedUrlMatchingFilterand replaced call sites with direct{ equal: [{ path: ["type", ...] }, { parameter: ... }] }filters (plus nested path equivalents for link traversals).
Reviewed changes
Copilot reviewed 51 out of 51 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| libs/@local/hash-isomorphic-utils/src/page-entity-type-ids.ts | Replaces helper-based type filters with direct type.versionedUrl equality filters. |
| libs/@local/hash-isomorphic-utils/src/graph-queries.ts | Removes generateVersionedUrlMatchingFilter and trims unused imports/types. |
| libs/@local/hash-backend-utils/src/user-secret.ts | Inlines link-type filter to type.baseUrl against the edition cache. |
| libs/@local/hash-backend-utils/src/service-usage.ts | Inlines usage/service feature type filters to type.baseUrl. |
| libs/@local/hash-backend-utils/src/machine-actors.ts | Migrates type(inheritanceDepth = 0).baseUrl to type.baseUrl. |
| libs/@local/hash-backend-utils/src/google.ts | Inlines Google account type filter to type.baseUrl. |
| libs/@local/hash-backend-utils/src/flows/shared/get-flow-run-entity-by-id.ts | Inlines flow-run type filter to type.baseUrl. |
| libs/@local/hash-backend-utils/src/flows/process-flow-workflow/common-activities/persist-flow-activity.ts | Inlines flow-run type filter to type.baseUrl. |
| libs/@local/hash-backend-utils/src/flows/get-flow-context.ts | Inlines flow-run type filter to type.baseUrl. |
| libs/@local/hash-backend-utils/src/flows.ts | Inlines flow-run type filter to type.baseUrl. |
| libs/@local/graph/store/src/entity/store.rs | Updates rustdoc JSON example to use type.versionedUrl equality filter. |
| libs/@local/graph/postgres-store/src/store/postgres/query/table.rs | Adds Relation::is_to_many() to classify joins as fan-out vs to-one. |
| libs/@local/graph/postgres-store/src/store/postgres/query/statement/select.rs | Adds unit test for has_to_many_join behavior (cache vs fan-out path). |
| libs/@local/graph/postgres-store/src/store/postgres/query/compile.rs | Tracks has_to_many_join during relation-joining for later dedup decisions. |
| libs/@local/graph/postgres-store/src/store/postgres/knowledge/entity/summary.rs | Adds Deduplication enum, updates SQL wrapping, and changes type aggregation unnest strategy. |
| libs/@local/graph/postgres-store/src/store/postgres/knowledge/entity/mod.rs | Chooses Deduplication::{Skip,Required} for summarize_entities based on join fan-out + temporal axis shape. |
| apps/plugin-browser/src/pages/popup/popup-contents/action-center/history/shared/history-row/flow-metadata-cell-contents.tsx | Inlines usage record type filter to type.baseUrl. |
| apps/hash-integration-worker/src/shared/graph-requests.ts | Inlines optional entity type filter and flattens conditional clauses. |
| apps/hash-integration-worker/src/activities/flow-activities/integration-activities/persist-integration-entities-action.ts | Inlines entity/link type filters to type.versionedUrl. |
| apps/hash-frontend/src/shared/use-user-or-org.ts | Inlines user/org type filters to type.baseUrl. |
| apps/hash-frontend/src/shared/use-actors.ts | Inlines machine type filter to type.baseUrl. |
| apps/hash-frontend/src/shared/notification-count-context.tsx | Inlines notification type filters to type.baseUrl. |
| apps/hash-frontend/src/pages/shared/use-flow-runs-usage.ts | Inlines usage record type filter to type.baseUrl. |
| apps/hash-frontend/src/pages/shared/integrations/google/google-auth-context/use-google-accounts.ts | Inlines Google account type filter to type.baseUrl. |
| apps/hash-frontend/src/pages/shared/entity/entity-editor/claims-section.tsx | Inlines claim type filter to type.baseUrl. |
| apps/hash-frontend/src/pages/shared/entity-selector.tsx | Inlines expected entity type filters to type.versionedUrl. |
| apps/hash-frontend/src/pages/shared/block-collection/shared/mention-suggester.tsx | Inlines user/org type filters to type.baseUrl. |
| apps/hash-frontend/src/pages/settings/organizations/[shortname]/integrations.page.tsx | Inlines integration link type filter to type.baseUrl. |
| apps/hash-frontend/src/pages/notifications.page/notifications-with-links-context.tsx | Inlines notification type filter to type.baseUrl. |
| apps/hash-frontend/src/pages/notes.page.tsx | Inlines note type filter to type.baseUrl. |
| apps/hash-frontend/src/pages/index.page/waitlisted.tsx | Inlines prospective user type filter to type.baseUrl. |
| apps/hash-frontend/src/pages/@/[shortname]/shared/flow-visualizer/outputs/claims-output.tsx | Inlines claim type filter to type.baseUrl. |
| apps/hash-frontend/src/pages/@/[shortname].page.tsx | Inlines include-type filters to type.versionedUrl. |
| apps/hash-frontend/src/components/hooks/use-users-with-links.ts | Inlines user type filter to type.baseUrl. |
| apps/hash-frontend/src/components/hooks/use-orgs-with-links.ts | Inlines org type filter to type.baseUrl. |
| apps/hash-api/src/graphql/resolvers/knowledge/user/get-usage-records.ts | Simplifies user type filter to a single type.baseUrl equality filter. |
| apps/hash-api/src/graphql/resolvers/knowledge/org/invite-user-to-org.ts | Inlines invitation type filter selection to type.baseUrl. |
| apps/hash-api/src/graph/knowledge/system-types/user.ts | Inlines user/invitation type filters to type.baseUrl. |
| apps/hash-api/src/graph/knowledge/system-types/text.ts | Inlines link/entity type filters (including nested leftEntity/incoming link paths) to type.baseUrl. |
| apps/hash-api/src/graph/knowledge/system-types/org.ts | Migrates type(inheritanceDepth = 0).baseUrl to type.baseUrl. |
| apps/hash-api/src/graph/knowledge/system-types/notification.ts | Inlines mention/comment notification type filters to type.baseUrl. |
| apps/hash-api/src/graph/knowledge/system-types/linear-user-secret.ts | Inlines multiple nested link/entity type filters to type.baseUrl. |
| apps/hash-api/src/graph/knowledge/system-types/linear-integration-entity.ts | Inlines linear integration / sync link type filters to type.baseUrl. |
| apps/hash-api/src/graph/knowledge/system-types/block.ts | Inlines nested leftEntity type filter to type.baseUrl. |
| apps/hash-api/src/graph/ensure-system-graph-is-initialized/migrate-ontology-types/util.ts | Simplifies getEntitiesByType filter to type.versionedUrl equality. |
| apps/hash-api/src/auth/create-unverified-email-cleanup-job.ts | Inlines user type filter to type.baseUrl. |
| apps/hash-ai-worker-ts/src/shared/testing-utilities/get-alice-user-account-id.ts | Inlines user type filter to type.baseUrl. |
| apps/hash-ai-worker-ts/src/activities/shared/find-existing-entity.ts | Inlines proposed entity type filters to type.versionedUrl. |
| apps/hash-ai-worker-ts/src/activities/flow-activities/research-entities-action/coordinating-agent/summarize-existing-entities.ai.test.ts | Inlines user type filter to type.baseUrl. |
| apps/hash-ai-worker-ts/src/activities/flow-activities/research-entities-action.ts | Inlines claim type filter to type.baseUrl. |
| apps/hash-ai-worker-ts/src/activities/flow-activities/process-automatic-browsing-settings-action.ts | Inlines browser plugin settings type filter to type.baseUrl. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * We specify each of these page types individually rather than Page, which they both inherit from, | ||
| * because checking against types involving inheritance is currently slow. |
There was a problem hiding this comment.
This comment is no longer valid. We could replace the filter below with a single check against systemEntityTypes.page.entityTypeId. Or just delete this whole comment.
There was a problem hiding this comment.
I removed the function and replaced the caller by a proper filter.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8826211. Configure here.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |

🌟 What is the purpose of this PR?
Two changes to how entity queries are compiled in the Postgres store:
summarizeEntitiestype aggregation no longer expands the cached type arrays with a per-rowCROSS JOIN LATERAL unnest, and skips theDISTINCT ONdeduplication when the compiled query cannot emit duplicate rows.apps/andlibs/are inlined so they resolve against the materializedentity_edition_cachecolumns instead of theentity_is_of_typejoin.🔗 Related links
🔍 What does this change?
summarizeEntities aggregation (
hash-graph-postgres-store):unnestin theSELECTlist (ProjectSet) instead ofCROSS JOIN LATERAL unnest(...).Relation::is_to_manyandSelectCompiler::has_to_many_join. ThehitsCTE omitsDISTINCT ONwhen no to-many (fan-out) join was added and the variable temporal axis is a collapsed point; otherwise it deduplicates as before. Selection is via aDeduplicationenum (no boolean parameter).Type-filter inlining (
apps/+libs/):generateVersionedUrlMatchingFilter. Every call site is inlined as a direct{ equal: [{ path: ["type", "baseUrl"] }, { parameter: …entityTypeBaseUrl }] }—["type", "versionedUrl"]for rawVersionedUrlinputs and version-sensitive sites,…linkEntityTypeBaseUrlfor link types.inheritanceDepth = 0path qualifier: a baretypepath resolves to the GIN-indexedentity_edition_cachebase_urls/versioned_urlscolumns rather than theentity_is_of_typejoin. The remaining hand-writteninheritanceDepth = 0filters (machine-actors,org) and a rustdoc example are migrated to the same form.Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
🛡 What tests cover this?
hash-graph-postgres-storeunit tests:statement_all_dimensions,statement_count_only,statement_skips_dedup,has_to_many_join_flag.lint:tsc).❓ How to test this?
cargo nextest run -p hash-graph-postgres-storeturbo run lint:tscfor the affected TypeScript packages.