Skip to content

Fix markdown parser dropping local files whose name starts with "http"#494

Open
Osamaali313 wants to merge 1 commit into
Egonex-AI:mainfrom
Osamaali313:fix/markdown-local-http-refs
Open

Fix markdown parser dropping local files whose name starts with "http"#494
Osamaali313 wants to merge 1 commit into
Egonex-AI:mainfrom
Osamaali313:fix/markdown-local-http-refs

Conversation

@Osamaali313

Copy link
Copy Markdown

Summary

MarkdownParser.extractReferences is meant to extract local file/image references and skip external URLs (per its docstring). It does so with:

if (target.startsWith("http")) continue; // Skip external URLs

But startsWith("http") matches the bare substring, so any local file whose name begins with "http" is silently dropped — e.g. [guide](http-client.md), [notes](https-setup.md), [x](httpie.md). These are relative links that should become file reference edges in the knowledge graph but are discarded, producing missing edges.

Fix

Match an actual URL scheme instead of the "http" prefix:

-      if (target.startsWith("http")) continue; // Skip external URLs
+      // Skip external URLs (scheme://... or protocol-relative //...), but keep
+      // local files whose name merely begins with "http" (e.g. http-client.md).
+      if (/^[a-zA-Z][a-zA-Z0-9+.-]*:\/\//.test(target) || target.startsWith("//")) continue;

This still skips https://example.com, http://…, ftp://…, and protocol-relative //cdn/…, while keeping local files like http-client.md.

Testing

Added a case to the MarkdownParser suite in parsers.test.ts:

it("keeps local files whose name starts with 'http'", () => {
  const content = "[client](http-client.md) and [ext](https://example.com)";
  const refs = parser.extractReferences!("README.md", content);
  expect(refs).toHaveLength(1);
  expect(refs[0].target).toBe("http-client.md");
});

Verified standalone with the exact extractReferences regex/logic:

input: See [HTTP client](http-client.md) and [setup](./setup.md), but ignore [site](https://example.com) and ![img](pic.png).
BEFORE: ["file:./setup.md","image:pic.png"]                       (http-client.md dropped)
AFTER:  ["file:http-client.md","file:./setup.md","image:pic.png"] (kept; external URL still skipped)

The existing "extracts file references" and "skips external URLs in references" tests remain satisfied by the fix.

extractReferences skipped any link target starting with "http" to filter
external URLs, but startsWith("http") also matches local files such as
http-client.md, https-setup.md, or httpie.md, so those relative references
were silently dropped from the knowledge graph. Match an actual URL scheme
(scheme://... or protocol-relative //...) instead, so external URLs are still
skipped while local files are kept.
Copilot AI review requested due to automatic review settings June 21, 2026 14:28

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants