Improve session transcript search#2330
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@VamsiKrishna0101 is attempting to deploy a commit to the Different AI Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
1 issue found across 2 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
@benjaminshafii addressed the Cubic feedback by making transcript search tokenization Unicode-aware and adding regression coverage for a non-ASCII query term. Also added a visible sidebar entry point for session search so the feature is easier to discover without knowing Validation:
Ready for review when you have a chance. |
Pablosinyores
left a comment
There was a problem hiding this comment.
The ranked token matching (exact/prefix/substring/edit-distance) with stop-word filtering reads well, and the tests cover the interesting cases.
One concern in buildTokenSnippet: the match slice is text.slice(first.start, last.end), bounded only by the first and last matched token positions. With the AND matching across a long transcript entry, two tokens can land far apart (token A near the start, token B thousands of chars later), so the rendered match becomes the entire span between them. matchTokenizedQuery penalizes large spans in scoring (proximityBonus = max(0, 120 - span)), but the snippet itself is not capped — a low-scoring distant match still renders a huge highlight. Worth clamping the snippet to a window around the highest-scoring range (or capping last.end - first.start).
Minor: matchTokenizedQuery requires every token to match (if (!range) return null). That is stricter than the "match remembered terms" goal — a single typo beyond editDistanceWithin drops the whole entry. If partial matching is intended, scoring on matched-token count rather than all-or-nothing would track the stated behavior more closely.
82f7846 to
59899c5
Compare
Yep, agreed on the snippet issue. I updated On partial matching: I kept transcript matching as all-token matching intentionally for this PR. Session titles already provide looser/fuzzy discovery, while transcript matches can get noisy quickly if long messages match only one remembered term. I’d rather keep this first change precise and revisit partial-token transcript ranking separately if we want broader recall. |


Summary
Why
Ctrl+Shift+Fand the command palette.auth redirectfind transcript text such asauthentication failed after OAuth redirect, and makes the feature easier to access from the sidebar.Issue
Scope
apps/app/src/react-app/domains/session/search/session-search.ts.Search sessionsaction in the sidebar footer.SessionSearchDialog.apps/app/tests/session-search.test.ts.Out of scope
Testing
Ran
pnpm --filter @openwork/app exec bun test tests/session-search.test.tspnpm --filter @openwork/app typecheckResult
tests/session-search.test.ts: 4 passedtsc -p tsconfig.json --noEmit: passedCI status
Manual verification
Ctrl+Shift+F.auth redirect.Search sessionsentry point.Search sessionsand confirmed it opens the existing search dialog.Evidence
Risk
Rollback