Skip to content

feat: convert <img> HTML tags to ac:image in Confluence output#777

Closed
moia-sven-ole wants to merge 30 commits into
kovetskiy:masterfrom
moia-sven-ole:feature/html-img-tag-support
Closed

feat: convert <img> HTML tags to ac:image in Confluence output#777
moia-sven-ole wants to merge 30 commits into
kovetskiy:masterfrom
moia-sven-ole:feature/html-img-tag-support

Conversation

@moia-sven-ole

Copy link
Copy Markdown
Contributor

Feature: Convert HTML tags to ac:image in Confluence output

Markdown natively supports HTML <img> tags, and authors commonly use them to control image size while keeping documents readable in standard Markdown renderers like GitHub.
Mark had no handling for these tags, so they would be stripped or passed through as raw HTML, which Confluence does not render.

This change detects <img> tags in HTML blocks and converts them to Confluence's ac:image format, consistent with how Markdown-syntax images (![]()) are already handled. Local files are uploaded as attachments, external URLs are linked directly. The width, alt, and title attributes are all carried over.

The practical benefit over the existing macro-based approach (<!-- width=300 -->) is that sizing an image no longer requires two separate steps. A single <img> tag handles the upload and the size in one pass, using syntax that works in many Markdown renderer without mark-specific extensions.

The PR also adds an integration test covering the three main paths (local attachment with attributes, external URL, URL with special characters) and documents the feature in the README.

In addition I think this also fixes #491 and #325

Handles <img src="..." width="600" /> blocks so users can control
image display width inline without any separate metadata headers.
Width from the tag takes precedence over detected file dimensions.
@moia-sven-ole

Copy link
Copy Markdown
Contributor Author

The failing test looks like a timeout issue in TestExtractD2Image(d2_test.go).
I guess this one is flaky since I have not touched the code there or it is related to the github actions issue
image

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for converting HTML <img> tags inside Markdown HTML blocks into Confluence <ac:image> markup, mirroring the existing handling of Markdown ![]() image syntax. Local files are uploaded as attachments (re-using attachment.ResolveLocalAttachments) while non-local sources fall through to ri:url. The src, width, alt, and title attributes are propagated, addressing issues #491 and #325.

Changes:

  • Extends ConfluenceHTMLBlockRenderer with dependencies (Attacher, path, image-align) and adds tryRenderImgTag / parseImgAttrs to detect and render <img> tags via the ac:image template.
  • Updates markdown.go extensions to wire the new constructor parameters.
  • Adds README documentation, a unit test for attribute parsing, and golden-file integration tests (testdata/html-img.{md,html}).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
renderer/htmlblock.go New <img>ac:image conversion logic and renderer field additions
renderer/htmlblock_test.go Unit tests for parseImgAttrs only
markdown/markdown.go Passes new constructor args to NewConfluenceHTMLBlockRenderer in both extensions
testdata/html-img.md Integration input covering attachment, URL, and URL-with-entities cases
testdata/html-img.html Expected ac:image output for the integration test
README.md User-facing documentation of the new feature
Comments suppressed due to low confidence (5)

renderer/htmlblock.go:131

  • In the URL branch the alignment is taken straight from r.ImageAlign, but the local-attachment branch (and ConfluenceImageRenderer) call calculateAlign(r.ImageAlign, effectiveWidth) to enforce the ≥760 → "center" rule. This means an external <img> with width ≥ 760 will not be forced to centered alignment as it is in the other code paths, producing inconsistent rendering between local and remote images.
		effectiveAlign := r.ImageAlign
		err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", struct {
			Align          string
			Layout         string
			OriginalWidth  string
			OriginalHeight string
			Width          string
			Height         string
			Title          string
			Alt            string
			Attachment     string
			Url            string
		}{
			effectiveAlign,
			calculateLayout(effectiveAlign, width),
			"", "", width, "", title, alt, "", escapedURL,

renderer/htmlblock.go:137

  • Detection of "is this a local file?" relies on os.Open failing for everything that is not a local file. filepath.Join(base, "https://example.com/logo.png") produces something like base/https:/example.com/logo.png, and the URL branch is only reached because the open happens to fail. This is fragile (an attacker-controlled or accidental directory matching the URL prefix could change behavior, and on Windows the : makes the path itself invalid in different ways) and conflates "file not found" with "this is a URL", which will also hide genuine read errors for local files (e.g. permission denied) by silently turning them into ri:url references. Consider explicitly parsing src with url.Parse and treating any scheme-prefixed value (http, https, mailto, etc.) as a URL up-front.
	attachments, err := attachment.ResolveLocalAttachments(vfs.LocalOS, filepath.Dir(r.Path), []string{src})
	if err != nil || len(attachments) == 0 {
		// Not a local file — treat as URL
		escapedURL := strings.ReplaceAll(src, "&", "&amp;")
		effectiveAlign := r.ImageAlign
		err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", struct {
			Align          string
			Layout         string
			OriginalWidth  string
			OriginalHeight string
			Width          string
			Height         string
			Title          string
			Alt            string
			Attachment     string
			Url            string
		}{
			effectiveAlign,
			calculateLayout(effectiveAlign, width),
			"", "", width, "", title, alt, "", escapedURL,
		})
		if err != nil {
			return ast.WalkStop, err
		}
		return ast.WalkSkipChildren, nil
	}

renderer/htmlblock.go:105

  • The prefix check strings.HasPrefix(strings.ToLower(raw), "<img ") will miss valid HTML such as <img\tsrc="...">, <img\nsrc=...>, or <img> with attributes on the next line (which goldmark will still emit as a single multi-line HTMLBlock line entry). Since parseImgAttrs already uses html.ParseFragment, you could rely on that parser to decide whether the block actually contains an <img> rather than doing a strict prefix match on raw text.
	if !strings.HasPrefix(strings.ToLower(raw), "<img ") {
		return ast.WalkContinue, nil
	}

renderer/htmlblock.go:114

  • The return value of r.Attachments.Attach(attachments[0]) is ignored (the interface returns nothing today, but more importantly the error from ResolveLocalAttachments is silently swallowed on line 113: err != nil || len(attachments) == 0 falls through to the URL branch. That means a real I/O error reading a local file (e.g. permission denied, partially readable image whose image.DecodeConfig fails) will be reported to the user not as an error but as a silently-broken ri:url link pointing at the relative local path. This will be very confusing to debug. Consider distinguishing "file does not exist" from other errors and surfacing the latter.
	attachments, err := attachment.ResolveLocalAttachments(vfs.LocalOS, filepath.Dir(r.Path), []string{src})
	if err != nil || len(attachments) == 0 {
		// Not a local file — treat as URL

renderer/htmlblock.go:173

  • The two ExecuteTemplate calls duplicate the entire anonymous struct definition with the same ten fields. Consider extracting a named struct (or a small helper that takes the resolved fields and renders the template) to remove the duplication between the URL and attachment branches — and from the similar duplicated literals in renderer/image.go and renderer/fencedcodeblock.go.
		err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", struct {
			Align          string
			Layout         string
			OriginalWidth  string
			OriginalHeight string
			Width          string
			Height         string
			Title          string
			Alt            string
			Attachment     string
			Url            string
		}{
			effectiveAlign,
			calculateLayout(effectiveAlign, width),
			"", "", width, "", title, alt, "", escapedURL,
		})
		if err != nil {
			return ast.WalkStop, err
		}
		return ast.WalkSkipChildren, nil
	}

	r.Attachments.Attach(attachments[0])

	// Width from the <img> tag takes precedence over the detected file width.
	effectiveWidth := width
	if effectiveWidth == "" {
		effectiveWidth = attachments[0].Width
	}

	effectiveAlign := calculateAlign(r.ImageAlign, effectiveWidth)
	effectiveLayout := calculateLayout(effectiveAlign, effectiveWidth)
	displayWidth := calculateDisplayWidth(effectiveWidth, effectiveLayout)

	err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", struct {
		Align          string
		Layout         string
		OriginalWidth  string
		OriginalHeight string
		Width          string
		Height         string
		Title          string
		Alt            string
		Attachment     string
		Url            string
	}{
		effectiveAlign,
		effectiveLayout,
		attachments[0].Width,
		attachments[0].Height,
		displayWidth,
		"",
		title,
		alt,
		attachments[0].Filename,
		"",
	})

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock_test.go
@moia-sven-ole moia-sven-ole force-pushed the feature/html-img-tag-support branch from e7732ed to d56733f Compare May 18, 2026 19:06
@moia-sven-ole moia-sven-ole force-pushed the feature/html-img-tag-support branch from d56733f to 4c57756 Compare May 18, 2026 19:09
@mrueg mrueg requested a review from Copilot May 18, 2026 19:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock_test.go Outdated
Comment thread renderer/htmlblock_test.go Outdated
Comment thread renderer/htmlblock.go
Comment thread renderer/htmlblock.go Outdated
  Covers URL rendering, ampersand escaping, wide-image alignment,
  local attachment, I/O error propagation, and non-img passthrough.

  Also adds three failing tests documenting known bugs: unescaped XML
  characters in alt/title, missing local files aborting the render,
  and non-http/https schemes hard-erroring instead of falling back.
  html.ParseFragment decodes HTML entities in attribute values, so
  characters like & and " need to be re-escaped before passing into
  the ac:image XML template to avoid malformed Confluence storage XML.
  Previously any unresolvable local path (missing file, non-http scheme
  like data:) returned WalkStop and aborted the entire page render.
  Now only hard I/O errors (permission denied etc.) abort; file-not-found
  falls back to rendering the src as a URL, matching the behaviour of
  the Markdown image renderer.
  For multi-line blocks, check line 0 for known ac:layout patterns
  then bail immediately, avoiding pointless iteration over remaining
  lines that could never match.
  The prefix check was a fast path but fragile — it missed valid HTML
  like <img\tsrc="...">. parseImgAttrs already uses html.ParseFragment
  which correctly detects any <img> variant, and returns empty src for
  non-img input. The prefix check was redundant and less correct.
  Previously only http/https were detected upfront; schemes like data:,
  ftp:, mailto: fell through to ResolveLocalAttachments and relied on
  ErrNotExist as an accidental fallback. Now any non-empty scheme is
  routed directly to the URL branch, making the intent explicit and
  ResolveLocalAttachments only called for bare relative paths.
…comments

  Use htmlstdlib.EscapeString consistently for URL escaping instead of
  only replacing & — html.ParseFragment decodes all HTML entities in
  attribute values so <, >, and " also need re-escaping for valid XML.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock.go Outdated
…ct URL scheme detection

  - URL and missing-file-fallback branches now apply calculateDisplayWidth
    so full-width external images correctly emit ac:width="1800" instead
    of the raw pixel value, consistent with the attachment branch.
  - Replace u.Scheme != "" with an explicit allowlist (http, https, ftp,
    ftps, data, mailto) to avoid misclassifying filenames containing colons
    (e.g. images:foo.png) as URLs.
…Scheme

  Previously the rejection block used u.Opaque == "" to detect hierarchical
  URLs, which also caught harmless schemes like sftp://, s3://, ssh:// and
  aborted the document walk with WalkStop. blob: URIs slipped past the guard
  into local file resolution. Replace with an explicit isDangerousScheme
  allowlist (javascript, vbscript, file) so only truly dangerous schemes abort
  the walk.
…te rendering

  src attribute values are HTML-entity-decoded by html.ParseFragment, so a
  filename like arch&amp;logo.png becomes arch&logo.png. The convertAttachment
  template function only replaces / with _ and does not XML-escape, so the
  unescaped & produced malformed XML in ri:filename. Similarly, a width value
  containing XML-special characters was passed to ac:width unescaped.

  Escape width alongside alt and title at the top of tryRenderImgTag, and
  escape Filename at the assignment site.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock_test.go Outdated
Comment thread renderer/htmlblock.go Outdated
@moia-sven-ole moia-sven-ole requested a review from Copilot June 10, 2026 10:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread renderer/htmlblock.go
@moia-sven-ole moia-sven-ole requested a review from Copilot June 12, 2026 18:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

renderer/htmlblock.go:1

  • The new break means multi-line HTML blocks (l > 1) stop scanning after the first line, so Confluence directive comment markers (e.g. <!-- ac:layout -->) on line 2+ will no longer be recognized. To preserve prior behavior, scan all lines for these markers regardless of l, and only branch into the <img> conversion paths separately (e.g., after the marker scan completes).
package renderer

Comment thread renderer/htmlblock.go
Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock.go Outdated
Comment thread README.md
@moia-sven-ole moia-sven-ole requested a review from Copilot June 12, 2026 18:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock.go
Comment thread renderer/htmlblock.go
Comment thread renderer/htmlblock.go
@moia-sven-ole moia-sven-ole requested a review from Copilot June 12, 2026 19:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Comment thread renderer/htmlblock.go
Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock.go
Comment thread README.md
Comment thread renderer/htmlblock.go Outdated
@moia-sven-ole moia-sven-ole requested a review from Copilot June 12, 2026 19:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Comment thread renderer/htmlblock.go Outdated
Comment thread renderer/htmlblock.go
@moia-sven-ole moia-sven-ole requested a review from Copilot June 12, 2026 20:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.

Comment thread renderer/htmlblock.go
Comment thread renderer/htmlblock_test.go Outdated
Comment thread renderer/htmlblock.go Outdated
@moia-sven-ole moia-sven-ole requested a review from Copilot June 12, 2026 20:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@moia-sven-ole moia-sven-ole requested a review from Copilot June 12, 2026 20:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

Comment thread renderer/htmlblock.go
Comment on lines +322 to +338
// Force ri:url fallback for absolute paths or Windows UNC/drive paths
// to prevent local file exfiltration outside relative directories.
if filepath.IsAbs(sanitizedSrc) || isWindowsDrivePath(sanitizedSrc) || strings.HasPrefix(sanitizedSrc, "\\\\") {
return r.renderImgURL(w, sanitizedSrc, width, alt, title)
}

if r.Attachments == nil {
return r.renderImgURL(w, sanitizedSrc, width, alt, title)
}

attachments, err := attachment.ResolveLocalAttachments(vfs.LocalOS, filepath.Dir(r.Path), []string{sanitizedSrc})
if err != nil {
return r.renderImgURL(w, sanitizedSrc, width, alt, title)
}
if len(attachments) == 0 {
return r.renderImgURL(w, sanitizedSrc, width, alt, title)
}
Comment thread renderer/htmlblock.go
Comment on lines +250 to +265
func validateImgTagConversionInput(src, width string) (string, bool, error) {
// Sanitize src by stripping leading/trailing whitespace and ASCII control characters
src = strings.TrimFunc(src, func(r rune) bool {
return r <= ' ' || r == 127
})

if src == "" {
return "", false, nil
}

if u, err := url.Parse(src); err == nil {
scheme := strings.ToLower(u.Scheme)
if isDangerousScheme(scheme) {
return "", false, fmt.Errorf("img src %q: unsupported URL scheme %q", src, u.Scheme)
}
}
Comment thread renderer/htmlblock.go
Comment on lines +340 to +364
r.Attachments.Attach(attachments[0])

// Width from the <img> tag takes precedence over the detected file width.
effectiveWidth := width
if effectiveWidth == "" {
effectiveWidth = attachments[0].Width
}

effectiveAlign := calculateAlign(r.ImageAlign, effectiveWidth)
effectiveLayout := calculateLayout(effectiveAlign, effectiveWidth)
displayWidth := calculateDisplayWidth(effectiveWidth, effectiveLayout)

err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", acImageParams{
Align: effectiveAlign,
Layout: effectiveLayout,
OriginalWidth: attachments[0].Width,
OriginalHeight: attachments[0].Height,
Width: htmlstdlib.EscapeString(displayWidth),
Title: htmlstdlib.EscapeString(title),
Alt: htmlstdlib.EscapeString(alt),
Attachment: htmlstdlib.EscapeString(attachments[0].Filename),
})
if err != nil {
return ast.WalkStop, err
}
Comment thread README.md
Comment on lines +789 to +794
#### Validation & Safety Rules

* **Width:** Must be a positive integer without units or surrounding whitespace (e.g., `width="600"` is valid; `width="100%"` or `width="600px"` are invalid and will skip conversion, falling back to raw HTML).
* **Scheme Sanitization:** Dangerous URL schemes (such as `javascript:`, `vbscript:`, or `file:`) in the `src` attribute are strictly blocked, returning a compilation error instead of outputting unsafe content.
* **Safe-mode Integration:** Standalone `<img>` tags are converted and resolved even in safe mode (when other unsafe raw HTML is disabled) because their content, schemes, and attributes are strictly sanitized and whitelisted.
* **Absolute Path Fallback:** To prevent system file access or path traversal, absolute paths (e.g. `/etc/...` or `C:\...`) and UNC paths (`\\server\...`) are not attached locally; they are safely fallback-rendered as external links (`ri:url`).
@moia-sven-ole

Copy link
Copy Markdown
Contributor Author

Closing this in favor of #789. I ended up down a rabbit hole trying to build a perfect solution to parse the entire HTML, which added a lot of unnecessary complexity just to enable tags. The new PR handles this much more cleanly without the overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML <img> Tags Don't Display Images on Confluence

2 participants