feat: convert <img> HTML tags to ac:image in Confluence output#777
feat: convert <img> HTML tags to ac:image in Confluence output#777moia-sven-ole wants to merge 30 commits into
Conversation
Handles <img src="..." width="600" /> blocks so users can control image display width inline without any separate metadata headers. Width from the tag takes precedence over detected file dimensions.
There was a problem hiding this comment.
Pull request overview
Adds support for converting HTML <img> tags inside Markdown HTML blocks into Confluence <ac:image> markup, mirroring the existing handling of Markdown ![]() image syntax. Local files are uploaded as attachments (re-using attachment.ResolveLocalAttachments) while non-local sources fall through to ri:url. The src, width, alt, and title attributes are propagated, addressing issues #491 and #325.
Changes:
- Extends
ConfluenceHTMLBlockRendererwith dependencies (Attacher, path, image-align) and addstryRenderImgTag/parseImgAttrsto detect and render<img>tags via theac:imagetemplate. - Updates
markdown.goextensions to wire the new constructor parameters. - Adds README documentation, a unit test for attribute parsing, and golden-file integration tests (
testdata/html-img.{md,html}).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| renderer/htmlblock.go | New <img> → ac:image conversion logic and renderer field additions |
| renderer/htmlblock_test.go | Unit tests for parseImgAttrs only |
| markdown/markdown.go | Passes new constructor args to NewConfluenceHTMLBlockRenderer in both extensions |
| testdata/html-img.md | Integration input covering attachment, URL, and URL-with-entities cases |
| testdata/html-img.html | Expected ac:image output for the integration test |
| README.md | User-facing documentation of the new feature |
Comments suppressed due to low confidence (5)
renderer/htmlblock.go:131
- In the URL branch the alignment is taken straight from
r.ImageAlign, but the local-attachment branch (andConfluenceImageRenderer) callcalculateAlign(r.ImageAlign, effectiveWidth)to enforce the ≥760 → "center" rule. This means an external<img>with width ≥ 760 will not be forced to centered alignment as it is in the other code paths, producing inconsistent rendering between local and remote images.
effectiveAlign := r.ImageAlign
err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", struct {
Align string
Layout string
OriginalWidth string
OriginalHeight string
Width string
Height string
Title string
Alt string
Attachment string
Url string
}{
effectiveAlign,
calculateLayout(effectiveAlign, width),
"", "", width, "", title, alt, "", escapedURL,
renderer/htmlblock.go:137
- Detection of "is this a local file?" relies on
os.Openfailing for everything that is not a local file.filepath.Join(base, "https://example.com/logo.png")produces something likebase/https:/example.com/logo.png, and the URL branch is only reached because the open happens to fail. This is fragile (an attacker-controlled or accidental directory matching the URL prefix could change behavior, and on Windows the:makes the path itself invalid in different ways) and conflates "file not found" with "this is a URL", which will also hide genuine read errors for local files (e.g. permission denied) by silently turning them intori:urlreferences. Consider explicitly parsingsrcwithurl.Parseand treating any scheme-prefixed value (http,https,mailto, etc.) as a URL up-front.
attachments, err := attachment.ResolveLocalAttachments(vfs.LocalOS, filepath.Dir(r.Path), []string{src})
if err != nil || len(attachments) == 0 {
// Not a local file — treat as URL
escapedURL := strings.ReplaceAll(src, "&", "&")
effectiveAlign := r.ImageAlign
err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", struct {
Align string
Layout string
OriginalWidth string
OriginalHeight string
Width string
Height string
Title string
Alt string
Attachment string
Url string
}{
effectiveAlign,
calculateLayout(effectiveAlign, width),
"", "", width, "", title, alt, "", escapedURL,
})
if err != nil {
return ast.WalkStop, err
}
return ast.WalkSkipChildren, nil
}
renderer/htmlblock.go:105
- The prefix check
strings.HasPrefix(strings.ToLower(raw), "<img ")will miss valid HTML such as<img\tsrc="...">,<img\nsrc=...>, or<img>with attributes on the next line (which goldmark will still emit as a single multi-line HTMLBlock line entry). SinceparseImgAttrsalready useshtml.ParseFragment, you could rely on that parser to decide whether the block actually contains an<img>rather than doing a strict prefix match on raw text.
if !strings.HasPrefix(strings.ToLower(raw), "<img ") {
return ast.WalkContinue, nil
}
renderer/htmlblock.go:114
- The return value of
r.Attachments.Attach(attachments[0])is ignored (the interface returns nothing today, but more importantly the error fromResolveLocalAttachmentsis silently swallowed on line 113:err != nil || len(attachments) == 0falls through to the URL branch. That means a real I/O error reading a local file (e.g. permission denied, partially readable image whoseimage.DecodeConfigfails) will be reported to the user not as an error but as a silently-brokenri:urllink pointing at the relative local path. This will be very confusing to debug. Consider distinguishing "file does not exist" from other errors and surfacing the latter.
attachments, err := attachment.ResolveLocalAttachments(vfs.LocalOS, filepath.Dir(r.Path), []string{src})
if err != nil || len(attachments) == 0 {
// Not a local file — treat as URL
renderer/htmlblock.go:173
- The two
ExecuteTemplatecalls duplicate the entire anonymous struct definition with the same ten fields. Consider extracting a named struct (or a small helper that takes the resolved fields and renders the template) to remove the duplication between the URL and attachment branches — and from the similar duplicated literals inrenderer/image.goandrenderer/fencedcodeblock.go.
err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", struct {
Align string
Layout string
OriginalWidth string
OriginalHeight string
Width string
Height string
Title string
Alt string
Attachment string
Url string
}{
effectiveAlign,
calculateLayout(effectiveAlign, width),
"", "", width, "", title, alt, "", escapedURL,
})
if err != nil {
return ast.WalkStop, err
}
return ast.WalkSkipChildren, nil
}
r.Attachments.Attach(attachments[0])
// Width from the <img> tag takes precedence over the detected file width.
effectiveWidth := width
if effectiveWidth == "" {
effectiveWidth = attachments[0].Width
}
effectiveAlign := calculateAlign(r.ImageAlign, effectiveWidth)
effectiveLayout := calculateLayout(effectiveAlign, effectiveWidth)
displayWidth := calculateDisplayWidth(effectiveWidth, effectiveLayout)
err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", struct {
Align string
Layout string
OriginalWidth string
OriginalHeight string
Width string
Height string
Title string
Alt string
Attachment string
Url string
}{
effectiveAlign,
effectiveLayout,
attachments[0].Width,
attachments[0].Height,
displayWidth,
"",
title,
alt,
attachments[0].Filename,
"",
})
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e7732ed to
d56733f
Compare
d56733f to
4c57756
Compare
Covers URL rendering, ampersand escaping, wide-image alignment, local attachment, I/O error propagation, and non-img passthrough. Also adds three failing tests documenting known bugs: unescaped XML characters in alt/title, missing local files aborting the render, and non-http/https schemes hard-erroring instead of falling back.
html.ParseFragment decodes HTML entities in attribute values, so characters like & and " need to be re-escaped before passing into the ac:image XML template to avoid malformed Confluence storage XML.
Previously any unresolvable local path (missing file, non-http scheme like data:) returned WalkStop and aborted the entire page render. Now only hard I/O errors (permission denied etc.) abort; file-not-found falls back to rendering the src as a URL, matching the behaviour of the Markdown image renderer.
For multi-line blocks, check line 0 for known ac:layout patterns then bail immediately, avoiding pointless iteration over remaining lines that could never match.
The prefix check was a fast path but fragile — it missed valid HTML like <img\tsrc="...">. parseImgAttrs already uses html.ParseFragment which correctly detects any <img> variant, and returns empty src for non-img input. The prefix check was redundant and less correct.
Previously only http/https were detected upfront; schemes like data:, ftp:, mailto: fell through to ResolveLocalAttachments and relied on ErrNotExist as an accidental fallback. Now any non-empty scheme is routed directly to the URL branch, making the intent explicit and ResolveLocalAttachments only called for bare relative paths.
…comments Use htmlstdlib.EscapeString consistently for URL escaping instead of only replacing & — html.ParseFragment decodes all HTML entities in attribute values so <, >, and " also need re-escaping for valid XML.
…ct URL scheme detection
- URL and missing-file-fallback branches now apply calculateDisplayWidth
so full-width external images correctly emit ac:width="1800" instead
of the raw pixel value, consistent with the attachment branch.
- Replace u.Scheme != "" with an explicit allowlist (http, https, ftp,
ftps, data, mailto) to avoid misclassifying filenames containing colons
(e.g. images:foo.png) as URLs.
…Scheme Previously the rejection block used u.Opaque == "" to detect hierarchical URLs, which also caught harmless schemes like sftp://, s3://, ssh:// and aborted the document walk with WalkStop. blob: URIs slipped past the guard into local file resolution. Replace with an explicit isDangerousScheme allowlist (javascript, vbscript, file) so only truly dangerous schemes abort the walk.
…te rendering src attribute values are HTML-entity-decoded by html.ParseFragment, so a filename like arch&logo.png becomes arch&logo.png. The convertAttachment template function only replaces / with _ and does not XML-escape, so the unescaped & produced malformed XML in ri:filename. Similarly, a width value containing XML-special characters was passed to ac:width unescaped. Escape width alongside alt and title at the top of tryRenderImgTag, and escape Filename at the assignment site.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
renderer/htmlblock.go:1
- The new
breakmeans multi-line HTML blocks (l > 1) stop scanning after the first line, so Confluence directive comment markers (e.g.<!-- ac:layout -->) on line 2+ will no longer be recognized. To preserve prior behavior, scan all lines for these markers regardless ofl, and only branch into the<img>conversion paths separately (e.g., after the marker scan completes).
package renderer
…st path, and doc updates
…s, and layout comments
…ecure cross-platform absolute paths
| // Force ri:url fallback for absolute paths or Windows UNC/drive paths | ||
| // to prevent local file exfiltration outside relative directories. | ||
| if filepath.IsAbs(sanitizedSrc) || isWindowsDrivePath(sanitizedSrc) || strings.HasPrefix(sanitizedSrc, "\\\\") { | ||
| return r.renderImgURL(w, sanitizedSrc, width, alt, title) | ||
| } | ||
|
|
||
| if r.Attachments == nil { | ||
| return r.renderImgURL(w, sanitizedSrc, width, alt, title) | ||
| } | ||
|
|
||
| attachments, err := attachment.ResolveLocalAttachments(vfs.LocalOS, filepath.Dir(r.Path), []string{sanitizedSrc}) | ||
| if err != nil { | ||
| return r.renderImgURL(w, sanitizedSrc, width, alt, title) | ||
| } | ||
| if len(attachments) == 0 { | ||
| return r.renderImgURL(w, sanitizedSrc, width, alt, title) | ||
| } |
| func validateImgTagConversionInput(src, width string) (string, bool, error) { | ||
| // Sanitize src by stripping leading/trailing whitespace and ASCII control characters | ||
| src = strings.TrimFunc(src, func(r rune) bool { | ||
| return r <= ' ' || r == 127 | ||
| }) | ||
|
|
||
| if src == "" { | ||
| return "", false, nil | ||
| } | ||
|
|
||
| if u, err := url.Parse(src); err == nil { | ||
| scheme := strings.ToLower(u.Scheme) | ||
| if isDangerousScheme(scheme) { | ||
| return "", false, fmt.Errorf("img src %q: unsupported URL scheme %q", src, u.Scheme) | ||
| } | ||
| } |
| r.Attachments.Attach(attachments[0]) | ||
|
|
||
| // Width from the <img> tag takes precedence over the detected file width. | ||
| effectiveWidth := width | ||
| if effectiveWidth == "" { | ||
| effectiveWidth = attachments[0].Width | ||
| } | ||
|
|
||
| effectiveAlign := calculateAlign(r.ImageAlign, effectiveWidth) | ||
| effectiveLayout := calculateLayout(effectiveAlign, effectiveWidth) | ||
| displayWidth := calculateDisplayWidth(effectiveWidth, effectiveLayout) | ||
|
|
||
| err = r.Stdlib.Templates.ExecuteTemplate(w, "ac:image", acImageParams{ | ||
| Align: effectiveAlign, | ||
| Layout: effectiveLayout, | ||
| OriginalWidth: attachments[0].Width, | ||
| OriginalHeight: attachments[0].Height, | ||
| Width: htmlstdlib.EscapeString(displayWidth), | ||
| Title: htmlstdlib.EscapeString(title), | ||
| Alt: htmlstdlib.EscapeString(alt), | ||
| Attachment: htmlstdlib.EscapeString(attachments[0].Filename), | ||
| }) | ||
| if err != nil { | ||
| return ast.WalkStop, err | ||
| } |
| #### Validation & Safety Rules | ||
|
|
||
| * **Width:** Must be a positive integer without units or surrounding whitespace (e.g., `width="600"` is valid; `width="100%"` or `width="600px"` are invalid and will skip conversion, falling back to raw HTML). | ||
| * **Scheme Sanitization:** Dangerous URL schemes (such as `javascript:`, `vbscript:`, or `file:`) in the `src` attribute are strictly blocked, returning a compilation error instead of outputting unsafe content. | ||
| * **Safe-mode Integration:** Standalone `<img>` tags are converted and resolved even in safe mode (when other unsafe raw HTML is disabled) because their content, schemes, and attributes are strictly sanitized and whitelisted. | ||
| * **Absolute Path Fallback:** To prevent system file access or path traversal, absolute paths (e.g. `/etc/...` or `C:\...`) and UNC paths (`\\server\...`) are not attached locally; they are safely fallback-rendered as external links (`ri:url`). |
|
Closing this in favor of #789. I ended up down a rabbit hole trying to build a perfect solution to parse the entire HTML, which added a lot of unnecessary complexity just to enable |

Feature: Convert
HTML tags to
ac:imagein Confluence outputMarkdown natively supports HTML
<img>tags, and authors commonly use them to control image size while keeping documents readable in standard Markdown renderers like GitHub.Mark had no handling for these tags, so they would be stripped or passed through as raw HTML, which Confluence does not render.
This change detects
<img>tags in HTML blocks and converts them to Confluence'sac:imageformat, consistent with how Markdown-syntax images (![]()) are already handled. Local files are uploaded as attachments, external URLs are linked directly. The width, alt, and title attributes are all carried over.The practical benefit over the existing macro-based approach (
<!-- width=300 -->) is that sizing an image no longer requires two separate steps. A single<img>tag handles the upload and the size in one pass, using syntax that works in many Markdown renderer without mark-specific extensions.The PR also adds an integration test covering the three main paths (local attachment with attributes, external URL, URL with special characters) and documents the feature in the README.
In addition I think this also fixes #491 and #325