Skip to content

fix: preserve real prefix when middle ID segment is a reserved word#513

Open
JOhnsonKC201 wants to merge 1 commit into
Egonex-AI:mainfrom
JOhnsonKC201:fix/strip-valid-prefix-middle-reserved-word
Open

fix: preserve real prefix when middle ID segment is a reserved word#513
JOhnsonKC201 wants to merge 1 commit into
Egonex-AI:mainfrom
JOhnsonKC201:fix/strip-valid-prefix-middle-reserved-word

Conversation

@JOhnsonKC201

Copy link
Copy Markdown

Problem

stripToValidPrefix in analyzer/normalize-graph.ts collapses any node ID whose second segment happens to be a valid prefix, treating it as a double-prefix duplicate. This corrupts IDs where a reserved word legitimately appears as a middle path segment.

For example, endpoint:service:getUser is parsed as:

  • outer segment endpoint (valid prefix) ✓
  • next segment service (also a valid prefix) → wrongly assumed to be a duplicate prefix

…so the real endpoint prefix is dropped and the function returns { prefix: "service", path: "getUser" }, yielding service:getUser. This:

  • changes the node type (endpoint → service),
  • breaks edge references that point at the original ID, and
  • violates idempotency — normalizing an already-normalized ID mutates it.

Fix

Only collapse a true same-prefix duplicate (e.g. file:file:src/foo.ts) by requiring the inner segment to equal the outer prefix:

// before
if (innerColonIdx > 0 && VALID_PREFIXES.has(rest.slice(0, innerColonIdx))) {

// after
if (innerColonIdx > 0 && rest.slice(0, innerColonIdx) === segment) {

A different reserved word in the middle is a legitimate path segment and is preserved. The genuine file:file:... double-prefix case still collapses as before.

Tests

Added two regression tests to normalize-graph.test.ts:

  • endpoint:service:getUser is preserved unchanged (was previously corrupted to service:getUser).
  • normalization is idempotent for IDs with a reserved-word middle segment.

pnpm --filter @understand-anything/core test755 passing (including the existing file:file: double-prefix test, which still passes).

stripToValidPrefix collapsed any ID whose second segment was a valid
prefix, treating e.g. "endpoint:service:x" as a double-prefix and
returning "service:x". This dropped the real outer prefix, corrupting
the node type and breaking edge references and idempotency.

Only collapse a true same-prefix duplicate (e.g. "file:file:...") by
requiring the inner segment to equal the outer prefix. A different
reserved word in the middle is a legitimate path segment and is kept.

Adds regression tests covering the middle-reserved-word case and
idempotency.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bfe10eed8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const rest = remaining.slice(colonIdx + 1);
const innerColonIdx = rest.indexOf(":");
if (innerColonIdx > 0 && VALID_PREFIXES.has(rest.slice(0, innerColonIdx))) {
if (innerColonIdx > 0 && rest.slice(0, innerColonIdx) === segment) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use expected type when collapsing prefixed IDs

When an LLM emits a project-prefixed ID and the project name is also a reserved prefix (for example a file node with service:file:src/foo.ts, or any bad outer valid prefix before the expected prefix), this condition no longer recurses because the inner prefix differs from the outer one. normalizeNodeId then returns service:file:src/foo.ts for a node whose type is file, so the graph no longer uses the canonical type:path ID and edges that reference the canonical file:src/foo.ts form are dropped as dangling. The duplicate-prefix decision needs the expected node prefix to distinguish this case from legitimate middle path segments like endpoint:service:x.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant