Icons Registry: improve SVG sanitizer#75550
Conversation
e537f45 to
7bdd4c1
Compare
| if ( ! preg_match( '/<svg\b/i', $filtered_content ) ) { | ||
| return ''; | ||
| } |
There was a problem hiding this comment.
wp_kses() removes disallowed tags but not the text inside them, so I added a logic that returns an empty string if the tag is not an SVG element.
| '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" width="24" height="24" class="icon" aria-hidden="true"><path d="M0 0" fill="currentColor" /></svg>', | ||
| '<svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24" width="24" height="24" class="icon" aria-hidden="true"><path d="M0 0" fill="currentColor" /></svg>', |
There was a problem hiding this comment.
As far as I've tested, lowercase viewbox is automatically recognized as viewBox by major browsers, so as long as we're using inline SVG as HTML content, we shouldn't have any problems.
<!DOCTYPE html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100" width="100">
<rect x="0" y="0" width="100%" height="100%" />
<circle cx="50%" cy="50%" r="4" fill="white" />
</svg>
<svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 100 100" width="100">
<rect x="0" y="0" width="100%" height="100%" />
<circle cx="50%" cy="50%" r="4" fill="white" />
</svg>
</body>
</html>
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
@t-hamano this seems like a good improvement since it retains the structure of the existing sanitization while adding extra attributes. did you verify that none of the newly-allowed attributes allows for script execution? one note here is that SVG is complicated to parse, and even the search for $svg = '';
$processor = WP_HTML_Processor::create_fragment( $icon_content );
if ( ! $processor->next_token() || 'SVG' !== $processor->get_tag() ) {
return '';
}
$svg .= $processor->serialize_token();
$svg_depth = $processor->get_current_depth();
while ( $processor->next_token() && $processor->get_current_depth() > $svg_depth ) {
$svg .= $processor->serialize_token();
}
// Parsing has stopped even though there might be additional content.
$filtered = wp_kses( ... )This could also be done after filtering with Unfortunately there could be issues with self-closing tags, which exist within the SVG content but otherwise not in the HTML. In HTML, sticking with for gradual improvement I think this is good as long as we have a thorough review of the attributes and tags we’re allowing. |
| * @return string The sanitized icon SVG content. | ||
| */ | ||
| private function sanitize_icon_content( $icon_content ) { | ||
| // Core attributes applicable to most elements. |
There was a problem hiding this comment.
This appears to be using the elements and attributes from my earlier comment, can you confirm?
Would be nice to validate those, as I haven't gone in-depth in that, and there are plenty of elements and attributes that SVGs can support. Also we know that any KSES-related changes shouldn't be taken lightly 😉
|
Sorry for the late reply. I'm working with @mcsf to build the infrastructure to allow the Icon API to be extended on the Gutenberg plugin. If that works out, I'll come back to this PR. |
7bdd4c1 to
26a51db
Compare
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
SVG markup may legitimately begin with an XML declaration (parsed as a bogus comment), comments, a doctype, or whitespace. Advance past those leading tokens instead of assuming the first token is the SVG element, so such icons are no longer wrongly rejected. Co-Authored-By: Claude <noreply@anthropic.com>
Return null instead of an empty string when the content is not valid SVG, so failure is signalled distinctly rather than through an empty string. The caller already treats both as invalid via empty(). Co-Authored-By: Claude <noreply@anthropic.com>
WP_Icons_Registry::sanitize_icon_content() in WordPress core declares no parameter type, so adding 'string' to the override narrows the parameter and triggers a fatal incompatible-declaration error on WP 7.0 and trunk. Leave the parameter untyped: it accepts any future parent parameter type (contravariance), while the covariant ?string return type is kept. Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit 29eca95. The override must stay signature-compatible with WP_Icons_Registry::sanitize_icon_content() in WordPress core, which is untyped and returns an empty string on failure. Restore the empty-string failure returns, drop the ?string return type, and revert the test expectations accordingly. Co-Authored-By: Claude <noreply@anthropic.com>
Convert the per-element attribute literals inside $allowed_tags from array<string, true> to array_fill_keys( string[], true ), matching the shared attribute groups already converted earlier. Co-Authored-By: Claude <noreply@anthropic.com>
The file was an accidental copy with a placeholder name. Renaming it to 12197.md aligns it with the wordpress-develop PR it references.
setAccessible() is a no-op since PHP 8.1, where reflection can access private methods by default. Guarding the call avoids the now-redundant invocation on modern PHP versions.
|
Flaky tests detected in e39539b. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/27810208343
|
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…ists Replace the repeated array_fill_keys( array( ... ), true ) calls in sanitize_icon_content() with a single variadic helper, removing the boilerplate duplicated across every SVG element definition. Co-Authored-By: Claude <noreply@anthropic.com>
| '#comment' === $token_type | ||
| || '#doctype' === $token_type | ||
| || ( '#text' === $token_type && '' === trim( $processor->get_modifiable_text() ) ) |
There was a problem hiding this comment.
What about #cdata-section?
I also wondered about the XML Processing Instruction, whether this would be interpreted as #funky-comment. But it doesn't seem so.
Also, it seems the #doctype token type doesn't come through. It is parsed as a comment.
See PHP Playground
There was a problem hiding this comment.
I also wondered about the XML Processing Instruction, whether this would be interpreted as
#funky-comment. But it doesn't seem so.Also, it seems the
#doctypetoken type doesn't come through. It is parsed as a comment.
As I understand it, this might be an expected behavior. Quoting from #75550 (comment):
the HTML Processor is parsing SVGs as they would appear inline in HTML.
There was a problem hiding this comment.
What about #cdata-section?
This block ignores some allowed content before the <svg> tag or reject the entire content.
This really depends on how strict things are expected to be. The fragment parser starts in HTML where CDATA cannot appear. "CDATA sections" are actually comments at this point.
I don't know how valuable it is to be more strict at this point, we could do things like only allow the <?xml … ?> which the HTML API recognizes as COMMENT_AS_PI_NODE_LOOKALIKE-type comment with an xml tag name, but this just seems like noise.
I'm undecided, but I'd like this section to either become more strict or to become more relaxed.
A strict implementation would rigorously reject anything that doesn't conform to expectations:
<?xml … ?> (optional)
<!DOCTYPE svg …> (optional)
<svg>…</svg> (REQUIRED)
--- nothing else ---Processing this requires the full HTML processor and needs to look for a tree like this:
├─#comment(COMMENT_AS_PI_NODE_LOOKALIKE)
│ └─xml
├─DOCTYPE: svg
└─HTML
├─HEAD
└─BODY
└─svg:svg
└─#text …
A more relaxed implementation could just call $processor->next_tag( 'SVG' ) and proceed, ignoring any surrounding content.
Right now, I'm leaning towards the relaxed implementation, but I can understand arguments for the strict version.
There was a problem hiding this comment.
Right now, I'm leaning towards the relaxed implementation, but I can understand arguments for the strict version.
I don't have a strong opinion, but if #77260 allowed custom icon registration, consumers would likely try to register SVG icons in various formats. In that case, a relaxed version would be more acceptable.
I modified the logic in f4c78d9. This commit also includes alignment of unit test data.
Co-authored-by: Weston Ruter <westonruter@gmail.com>
The trunk merge committed unresolved conflict markers, causing a PHP parse error. Keep both branches' tests and move the sanitizer tests to the end of the class. Co-Authored-By: Claude <noreply@anthropic.com>
| '#comment' === $token_type | ||
| || '#doctype' === $token_type | ||
| || ( '#text' === $token_type && '' === trim( $processor->get_modifiable_text() ) ) |
There was a problem hiding this comment.
What about #cdata-section?
This block ignores some allowed content before the <svg> tag or reject the entire content.
This really depends on how strict things are expected to be. The fragment parser starts in HTML where CDATA cannot appear. "CDATA sections" are actually comments at this point.
I don't know how valuable it is to be more strict at this point, we could do things like only allow the <?xml … ?> which the HTML API recognizes as COMMENT_AS_PI_NODE_LOOKALIKE-type comment with an xml tag name, but this just seems like noise.
I'm undecided, but I'd like this section to either become more strict or to become more relaxed.
A strict implementation would rigorously reject anything that doesn't conform to expectations:
<?xml … ?> (optional)
<!DOCTYPE svg …> (optional)
<svg>…</svg> (REQUIRED)
--- nothing else ---Processing this requires the full HTML processor and needs to look for a tree like this:
├─#comment(COMMENT_AS_PI_NODE_LOOKALIKE)
│ └─xml
├─DOCTYPE: svg
└─HTML
├─HEAD
└─BODY
└─svg:svg
└─#text …
A more relaxed implementation could just call $processor->next_tag( 'SVG' ) and proceed, ignoring any surrounding content.
Right now, I'm leaning towards the relaxed implementation, but I can understand arguments for the strict version.
| if ( 'SVG' !== $processor->get_tag() ) { | ||
| return ''; | ||
| } |
There was a problem hiding this comment.
As implemented, I don't think this is a problem because a preceding <math> tag would be rejected. However, if we relax this to use ->next_tag( 'SVG' ) and proceed, it could find <math><svg> (an SVG element in the math namespace).
It would be good to check that this is an SVG tag in the svg namespace to prevent this kind of issue.
The SVG root check matched on tag name alone, so an <svg> in a foreign namespace (e.g. the MathML-namespaced <svg> inside <math><svg>) was treated as the icon root. Also require the element to be in the SVG namespace so such elements are rejected. Co-Authored-By: Claude <noreply@anthropic.com>
Locate the icon root via next_tag( 'SVG' ) and ignore content surrounding it (XML declaration, doctype, comments, sibling elements), so real-world SVG files that wrap the root are accepted. Surrounding content is only ignored, never emitted, so wp_kses still bounds what is output. A second top-level SVG is still rejected to keep the icon unambiguous, and the namespace check continues to reject a foreign-namespaced <svg>. Reorganize the sanitizer data provider so groups follow a single criterion (behavior, labeled by comment; element category in the key) and add cases for multiple/nested/foreign-namespaced roots and ignored surrounding content. Co-Authored-By: Claude <noreply@anthropic.com>
What?
In the experimental icon registry, any string can be registered as SVG content. Unfortunately, there is still no core function to properly sanitize SVGs, so we've enhanced the private methods for sanitization and defined as many SVG-friendly tags and attributes as possible.
Testing Instructions
Please refer to the unit test to see what the function expects.