Skip to content

Icons Registry: improve SVG sanitizer#75550

Open
t-hamano wants to merge 36 commits into
trunkfrom
icon-registry-sanitizer
Open

Icons Registry: improve SVG sanitizer#75550
t-hamano wants to merge 36 commits into
trunkfrom
icon-registry-sanitizer

Conversation

@t-hamano

@t-hamano t-hamano commented Feb 15, 2026

Copy link
Copy Markdown
Contributor

What?

In the experimental icon registry, any string can be registered as SVG content. Unfortunately, there is still no core function to properly sanitize SVGs, so we've enhanced the private methods for sanitization and defined as many SVG-friendly tags and attributes as possible.

Testing Instructions

Please refer to the unit test to see what the function expects.

@t-hamano t-hamano self-assigned this Feb 15, 2026
@github-project-automation github-project-automation Bot moved this to 🔎 Needs Review in WordPress 7.0 Editor Tasks Feb 15, 2026
@t-hamano t-hamano added [Type] Experimental Experimental feature or API. [Block] Icon Affects the Icon block labels Feb 15, 2026
@t-hamano t-hamano force-pushed the icon-registry-sanitizer branch from e537f45 to 7bdd4c1 Compare February 15, 2026 08:58
Comment on lines +748 to +750
if ( ! preg_match( '/<svg\b/i', $filtered_content ) ) {
return '';
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wp_kses() removes disallowed tags but not the text inside them, so I added a logic that returns an empty string if the tag is not an SVG element.

Comment on lines +66 to +67
'<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" width="24" height="24" class="icon" aria-hidden="true"><path d="M0 0" fill="currentColor" /></svg>',
'<svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24" width="24" height="24" class="icon" aria-hidden="true"><path d="M0 0" fill="currentColor" /></svg>',

@t-hamano t-hamano Feb 15, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to see wp_kses() convert camel case attribute names (viewBox) to lower case (viewbox). Lower-case attribute names should be recognized correctly by most browsers, but I'm not sure if this should be allowed.

cc @dmsnell @sirreal

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I've tested, lowercase viewbox is automatically recognized as viewBox by major browsers, so as long as we're using inline SVG as HTML content, we shouldn't have any problems.

<!DOCTYPE html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<body>

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100" width="100">
  <rect x="0" y="0" width="100%" height="100%" />
  <circle cx="50%" cy="50%" r="4" fill="white" />
</svg>
<svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 100 100" width="100">
  <rect x="0" y="0" width="100%" height="100%" />
  <circle cx="50%" cy="50%" r="4" fill="white" />
</svg>

</body>
</html>
viewbox

@t-hamano t-hamano changed the title Icons Regstry: improveme SVG sanitizer Icons Regstry: improve SVG sanitizer Feb 15, 2026
@t-hamano t-hamano marked this pull request as ready for review February 15, 2026 09:09
@github-actions

github-actions Bot commented Feb 15, 2026

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: t-hamano <wildworks@git.wordpress.org>
Co-authored-by: tyxla <tyxla@git.wordpress.org>
Co-authored-by: sirreal <jonsurrell@git.wordpress.org>
Co-authored-by: westonruter <westonruter@git.wordpress.org>
Co-authored-by: dmsnell <dmsnell@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@mcsf mcsf changed the title Icons Regstry: improve SVG sanitizer Icons Registry: improve SVG sanitizer Feb 17, 2026
@dmsnell

dmsnell commented Feb 20, 2026

Copy link
Copy Markdown
Member

@t-hamano this seems like a good improvement since it retains the structure of the existing sanitization while adding extra attributes. did you verify that none of the newly-allowed attributes allows for script execution?

one note here is that SVG is complicated to parse, and even the search for <svg\b seems a bit suspicious. the HTML Processor can parse all valid inline SVGs, and one thing that might be useful here is to prepare the input using that and then dump the output into kses after normalization, where things that seem sneaky might be less prone to issues in kses.

$svg = '';
$processor = WP_HTML_Processor::create_fragment( $icon_content );
if ( ! $processor->next_token() || 'SVG' !== $processor->get_tag() ) {
	return '';
}

$svg .= $processor->serialize_token();
$svg_depth = $processor->get_current_depth();
while ( $processor->next_token() && $processor->get_current_depth() > $svg_depth ) {
	$svg .= $processor->serialize_token();
}

// Parsing has stopped even though there might be additional content.
$filtered = wp_kses( ... )

This could also be done after filtering with wp_kses() as it’s likely to corrupt the SVG, but normalizing first should help avoid a lot of problems. For instance, wp_kses() will be unaware of this but a single <p> terminates the SVG element. In fact, many tags that appear within inline SVG will end it and jump back into HTML. If we first extract the SVG in its entirety with the HTML Processor then we wouldn’t have to consider this.

Unfortunately there could be issues with self-closing tags, which exist within the SVG content but otherwise not in the HTML. In HTML, <svg/> is an entire SVG element, even though HTML contains no self-closing tags (the <svg/> tag is actually an SVG tag so the self-closing rule applies).

sticking with wp_kses() is appropriate apart from all of this. I have proposed dmsnell/wordpress-develop#20 and several other changes that will feed up into wp_kses() (WordPress/wordpress-develop#7409, WordPress/wordpress-develop#9270, and WordPress/wordpress-develop#9851 for example), so ultimately that should become more reliable.

for gradual improvement I think this is good as long as we have a thorough review of the attributes and tags we’re allowing.

* @return string The sanitized icon SVG content.
*/
private function sanitize_icon_content( $icon_content ) {
// Core attributes applicable to most elements.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be using the elements and attributes from my earlier comment, can you confirm?

Would be nice to validate those, as I haven't gone in-depth in that, and there are plenty of elements and attributes that SVGs can support. Also we know that any KSES-related changes shouldn't be taken lightly 😉

@t-hamano

Copy link
Copy Markdown
Contributor Author

Sorry for the late reply. I'm working with @mcsf to build the infrastructure to allow the Icon API to be extended on the Gutenberg plugin. If that works out, I'll come back to this PR.

@t-hamano t-hamano added [Type] Enhancement A suggestion for improvement. and removed [Type] Experimental Experimental feature or API. labels Feb 25, 2026
@t-hamano t-hamano added [Feature] Icons Related to Icon registration API and Icon REST API and removed [Block] Icon Affects the Icon block labels Mar 3, 2026
@t-hamano t-hamano moved this from 🔎 Needs Review to 🦵 Punted to 7.1 in WordPress 7.0 Editor Tasks Mar 3, 2026
@t-hamano t-hamano force-pushed the icon-registry-sanitizer branch from 7bdd4c1 to 26a51db Compare March 12, 2026 08:51
t-hamano and others added 5 commits June 17, 2026 13:18
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
SVG markup may legitimately begin with an XML declaration (parsed as a
bogus comment), comments, a doctype, or whitespace. Advance past those
leading tokens instead of assuming the first token is the SVG element,
so such icons are no longer wrongly rejected.

Co-Authored-By: Claude <noreply@anthropic.com>
Return null instead of an empty string when the content is not valid
SVG, so failure is signalled distinctly rather than through an empty
string. The caller already treats both as invalid via empty().

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread lib/class-wp-icons-registry-gutenberg.php Outdated
t-hamano and others added 3 commits June 17, 2026 13:50
WP_Icons_Registry::sanitize_icon_content() in WordPress core declares no
parameter type, so adding 'string' to the override narrows the parameter
and triggers a fatal incompatible-declaration error on WP 7.0 and trunk.
Leave the parameter untyped: it accepts any future parent parameter type
(contravariance), while the covariant ?string return type is kept.

Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit 29eca95.

The override must stay signature-compatible with
WP_Icons_Registry::sanitize_icon_content() in WordPress core, which is
untyped and returns an empty string on failure. Restore the empty-string
failure returns, drop the ?string return type, and revert the test
expectations accordingly.

Co-Authored-By: Claude <noreply@anthropic.com>
Convert the per-element attribute literals inside $allowed_tags from
array<string, true> to array_fill_keys( string[], true ), matching the
shared attribute groups already converted earlier.

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread backport-changelog/7.1/12197.md
t-hamano added 2 commits June 17, 2026 16:24
The file was an accidental copy with a placeholder name. Renaming it to
12197.md aligns it with the wordpress-develop PR it references.
setAccessible() is a no-op since PHP 8.1, where reflection can access
private methods by default. Guarding the call avoids the now-redundant
invocation on modern PHP versions.
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

Flaky tests detected in e39539b.
Some tests passed with failed attempts. The failures may not be related to this commit but are still reported for visibility. See the documentation for more information.

🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/27810208343
📝 Reported issues:

Comment thread lib/class-wp-icons-registry-gutenberg.php Outdated
Comment thread phpunit/experimental/class-wp-icons-registry-gutenberg-test.php Outdated
Comment thread lib/class-wp-icons-registry-gutenberg.php Outdated
Comment thread lib/class-wp-icons-registry-gutenberg.php Outdated
Comment thread lib/class-wp-icons-registry-gutenberg.php Outdated
Comment thread lib/class-wp-icons-registry-gutenberg.php Outdated
t-hamano and others added 4 commits June 19, 2026 13:29
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…ists

Replace the repeated array_fill_keys( array( ... ), true ) calls in
sanitize_icon_content() with a single variadic helper, removing the
boilerplate duplicated across every SVG element definition.

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread lib/class-wp-icons-registry-gutenberg.php Outdated
Comment thread lib/class-wp-icons-registry-gutenberg.php Outdated
Comment on lines +896 to +898
'#comment' === $token_type
|| '#doctype' === $token_type
|| ( '#text' === $token_type && '' === trim( $processor->get_modifiable_text() ) )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about #cdata-section?

I also wondered about the XML Processing Instruction, whether this would be interpreted as #funky-comment. But it doesn't seem so.

Also, it seems the #doctype token type doesn't come through. It is parsed as a comment.

See PHP Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wondered about the XML Processing Instruction, whether this would be interpreted as #funky-comment. But it doesn't seem so.

Also, it seems the #doctype token type doesn't come through. It is parsed as a comment.

As I understand it, this might be an expected behavior. Quoting from #75550 (comment):

the HTML Processor is parsing SVGs as they would appear inline in HTML.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about #cdata-section?

This block ignores some allowed content before the <svg> tag or reject the entire content.

This really depends on how strict things are expected to be. The fragment parser starts in HTML where CDATA cannot appear. "CDATA sections" are actually comments at this point.

I don't know how valuable it is to be more strict at this point, we could do things like only allow the <?xml … ?> which the HTML API recognizes as COMMENT_AS_PI_NODE_LOOKALIKE-type comment with an xml tag name, but this just seems like noise.

I'm undecided, but I'd like this section to either become more strict or to become more relaxed.

A strict implementation would rigorously reject anything that doesn't conform to expectations:

<?xml … ?> (optional)
<!DOCTYPE svg …> (optional)
<svg></svg> (REQUIRED)
--- nothing else ---

Processing this requires the full HTML processor and needs to look for a tree like this:

├─#comment(COMMENT_AS_PI_NODE_LOOKALIKE)
│ └─xml  
├─DOCTYPE: svg
└─HTML
  ├─HEAD
  └─BODY
    └─svg:svg
      └─#text …

A more relaxed implementation could just call $processor->next_tag( 'SVG' ) and proceed, ignoring any surrounding content.

Right now, I'm leaning towards the relaxed implementation, but I can understand arguments for the strict version.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I'm leaning towards the relaxed implementation, but I can understand arguments for the strict version.

I don't have a strong opinion, but if #77260 allowed custom icon registration, consumers would likely try to register SVG icons in various formats. In that case, a relaxed version would be more acceptable.

I modified the logic in f4c78d9. This commit also includes alignment of unit test data.

t-hamano and others added 4 commits June 22, 2026 21:30
Co-authored-by: Weston Ruter <westonruter@gmail.com>
The trunk merge committed unresolved conflict markers, causing a PHP
parse error. Keep both branches' tests and move the sanitizer tests to
the end of the class.

Co-Authored-By: Claude <noreply@anthropic.com>
Comment on lines +896 to +898
'#comment' === $token_type
|| '#doctype' === $token_type
|| ( '#text' === $token_type && '' === trim( $processor->get_modifiable_text() ) )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about #cdata-section?

This block ignores some allowed content before the <svg> tag or reject the entire content.

This really depends on how strict things are expected to be. The fragment parser starts in HTML where CDATA cannot appear. "CDATA sections" are actually comments at this point.

I don't know how valuable it is to be more strict at this point, we could do things like only allow the <?xml … ?> which the HTML API recognizes as COMMENT_AS_PI_NODE_LOOKALIKE-type comment with an xml tag name, but this just seems like noise.

I'm undecided, but I'd like this section to either become more strict or to become more relaxed.

A strict implementation would rigorously reject anything that doesn't conform to expectations:

<?xml … ?> (optional)
<!DOCTYPE svg …> (optional)
<svg></svg> (REQUIRED)
--- nothing else ---

Processing this requires the full HTML processor and needs to look for a tree like this:

├─#comment(COMMENT_AS_PI_NODE_LOOKALIKE)
│ └─xml  
├─DOCTYPE: svg
└─HTML
  ├─HEAD
  └─BODY
    └─svg:svg
      └─#text …

A more relaxed implementation could just call $processor->next_tag( 'SVG' ) and proceed, ignoring any surrounding content.

Right now, I'm leaning towards the relaxed implementation, but I can understand arguments for the strict version.

Comment on lines +906 to +908
if ( 'SVG' !== $processor->get_tag() ) {
return '';
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As implemented, I don't think this is a problem because a preceding <math> tag would be rejected. However, if we relax this to use ->next_tag( 'SVG' ) and proceed, it could find <math><svg> (an SVG element in the math namespace).

It would be good to check that this is an SVG tag in the svg namespace to prevent this kind of issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f203e7d

t-hamano and others added 2 commits June 23, 2026 16:12
The SVG root check matched on tag name alone, so an <svg> in a foreign
namespace (e.g. the MathML-namespaced <svg> inside <math><svg>) was
treated as the icon root. Also require the element to be in the SVG
namespace so such elements are rejected.

Co-Authored-By: Claude <noreply@anthropic.com>
Locate the icon root via next_tag( 'SVG' ) and ignore content surrounding
it (XML declaration, doctype, comments, sibling elements), so real-world
SVG files that wrap the root are accepted. Surrounding content is only
ignored, never emitted, so wp_kses still bounds what is output. A second
top-level SVG is still rejected to keep the icon unambiguous, and the
namespace check continues to reject a foreign-namespaced <svg>.

Reorganize the sanitizer data provider so groups follow a single
criterion (behavior, labeled by comment; element category in the key)
and add cases for multiple/nested/foreign-namespaced roots and ignored
surrounding content.

Co-Authored-By: Claude <noreply@anthropic.com>
@mcsf mcsf removed their request for review June 23, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Feature] Icons Related to Icon registration API and Icon REST API [Type] Enhancement A suggestion for improvement.

Projects

Status: 🔎 Needs Review

Development

Successfully merging this pull request may close these issues.

5 participants