fix(xml): support unquoted HTML named entities#141
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbit
Walkthrough
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_xml_like.py (1)
294-315: ⚡ Quick winAdd a regression test for apostrophes in text before a named entity.
Please add a case like
<p>It's ©</p>to ensure text apostrophes do not disable later entity normalization. This directly protects the new scanner logic from quote-state regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_xml_like.py` around lines 294 - 315, Add a new regression test method after the existing test methods in the test class that verifies apostrophes in text do not disable entity normalization. Create a test case with XML content containing an apostrophe in the text (like "It's") followed by a named entity (like "©"), parse it using XMLLikeNode, and assert that the named entity is properly normalized to its Unicode equivalent. This ensures the scanner logic correctly distinguishes between text apostrophes and quote characters that delimit attribute values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@epub_translator/xml/xml_like.py`:
- Around line 240-249: The quote-state tracking logic at line 242-243 is being
applied throughout all content including plain text nodes, causing regular text
apostrophes like in "It's" to incorrectly set quote mode, which then breaks
entity handling for subsequent entities like "©". Quote state should only
be tracked when inside tag contexts (between '<' and '>'). Introduce a boolean
flag to track whether you are currently inside a tag, set it to true when
encountering '<' and false when encountering '>', and only execute the
quote-tracking logic (the line that updates the quote variable with the
conditional expression) when this flag is true. This ensures apostrophes in
plain text do not interfere with the quote state machine used for parsing
attributes.
---
Nitpick comments:
In `@tests/test_xml_like.py`:
- Around line 294-315: Add a new regression test method after the existing test
methods in the test class that verifies apostrophes in text do not disable
entity normalization. Create a test case with XML content containing an
apostrophe in the text (like "It's") followed by a named entity (like "©"),
parse it using XMLLikeNode, and assert that the named entity is properly
normalized to its Unicode equivalent. This ensures the scanner logic correctly
distinguishes between text apostrophes and quote characters that delimit
attribute values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fa84fa23-10e1-421a-bcc8-0921603fff51
📒 Files selected for processing (3)
epub_translator/xml/xml_like.pypyproject.tomltests/test_xml_like.py
Summary
Tests