FR #1359: add "Sentence per line" content rule#1529
Open
counterposition wants to merge 2 commits into
Open
Conversation
Splits root-level paragraph prose so each sentence is on its own source line, making line-based diffs localized to the edited sentence (semantic line breaks / ventilated prose). Implements platers#1359. - new dependency-free, atom-opaque sentence splitter (sentence-splitting.ts) with a token layer so masked placeholders / inline HTML are never split inside; configurable terminators + abbreviation suppression - reflowParagraphsOneSentencePerLine + detectTrailingLineBreakIndicator in mdast.ts: root-child paragraph selection, structural-divider handling so masked blocks (tables, %% comments, custom-ignore) adjacent to prose are preserved, hard-break (<br>/<br/>/\\/two-space) preservation - symmetric conflict handling with two-spaces-between-lines-with-content (UI modal both directions + settings-load guard in main.ts) - en.ts locale, additional-info docs (leads with the Obsidian "Strict line breaks" rendering caveat), regenerated content-rules.md, README entry - unit + ruleTest coverage incl. idempotency, B1/B2/B3/B4/M2/M5/M7/M8/P1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adversarial review follow-ups:
- URL/email autolinks are now opaque atoms. `url` masking leaves the
surrounding `<>` of <https://x> behind as `<{URL_PLACEHOLDER}>`; the
scanner only saw the inner placeholder, so a sentence after an autolink
was not split and a lone autolink line was wrongly merged into prose
(violating the plan's "a single autolink is a divider" invariant). Added
a CommonMark-autolink alternative (wrapped placeholder, absolute-URI, and
email forms) to both the token regex and the divider regex.
- Astral sentence terminators (e.g. emoji) are dropped during sanitization
instead of being accepted but never matched, since the scan compares
single UTF-16 code units. Documented as a BMP-only limitation.
Added unit + ruleTest coverage for both; full suite 1274 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
When Markdown is tracked in git (or any line-based VCS), keeping each sentence on its own source line makes diffs localized: editing one sentence produces a one-line patch instead of re-flowing an entire paragraph. This "semantic line breaks" / "ventilated prose" convention is requested in #1359 ("FR: Put each sentence on its own line"). The single comment on that issue asks for the sentence-terminating characters to be configurable — this PR exposes that (plus an abbreviation list) as options.
Rendering caveat (the first thing a reviewer/user will rightly ask). Under standard Markdown a single newline is a space, so rendered output is unchanged — but Obsidian's "Strict line breaks" setting is off by default, and with it off Obsidian renders a single newline as a visible line break. So for default-configured Obsidian users this rule does change reading-view rendering. This is stated plainly in the rule description, the settings docs, and the rule's additional-info page rather than buried — it's the key "is this safe for me" decision.
Fixes #1359What it does
Reflows only root-level paragraphs so each sentence is on its own line. Headings, lists, blockquotes, tables, fenced/indented code, math blocks, HTML blocks, footnote definitions, and YAML are left untouched (by AST construction, not string-sniffing). Reflowing prose nested in lists/blockquotes is intentionally out of scope for v1.
Sentence detection is a deterministic, dependency-free, O(n) scanner with a token layer so masked placeholders, inline HTML, and autolinks are fully opaque (never split inside, valid sentence starts). It is precision-first: configurable terminators (ASCII
.?!require trailing whitespace; CJK/full-width。!?and…do not), an abbreviation list (e.g.,U.S.,a.m., …), and suppression of single-letter initials, decimals/versions, and ellipses. Deliberate, documented heuristic limitations (lowercase/digit sentence starts, lone-capital endings, half-width.+CJK, mid-line<br>) are listed in the rule docs.Safety properties:
_/}/.can't corrupt a placeholder on restoration.%%comment%%, custom-ignore, lone link/autolink) adjacent to prose with no blank line is preserved verbatim with its adjacency intact (it is not this rule's job to add blank lines — that'sparagraph-blank-lines).,<br>,<br/>,\) are preserved and never merged across.paragraph-blank-lines↔two-spaces-between-lines-with-contentprecedent symmetrically: a confirm-modal on either UI toggle, plus a settings-load guard inmain.tsfor thedata.json/sync path.Files
New:
src/utils/sentence-splitting.ts,src/rules/sentence-per-line.ts, two test suites,docs/additional-info/rules/sentence-per-line.md. Touched:src/utils/mdast.ts(two helpers),src/main.ts(conflict block),src/rules/two-spaces-between-lines-with-content.ts(symmetric wiring),src/lang/locale/en.ts, generatedcontent-rules.md, README rule list.Notes for the reviewer
docs/contributing/adding-a-rule.md: FR exists (FR: Put each sentence on its own line #1359), rule copied from the template, options/examples/locale wired the standard way.format-yaml-arrayprecedent (optionsKey≠ config key) so aTextAreaOptionBuilderdefault/override doesn't hit the.splitpath.npm run docsregenerated README +content-rules.md; I scoped this PR to only the new rule and intentionally did not sweep in unrelated pre-existing doc-regeneration drift (heading-rules.mdand some README anchors) from an earlier source-only change — happy to send that separately if you'd prefer.TextAreadefault (the abbreviation list) with raw newlines in the options-table cell. This rule is the first with a non-emptyTextAreadefault, so there's no prior art for how you'd like it formatted — glad to adjust the generator or the default presentation per your preference.Test plan
npm run test— full suite green (this branch: 1274 passing, incl. 65 new cases)npm run lint— cleannpm run build— clean%%comment%%/custom-ignore block adjacent to prose is not corrupted; the conflict modal fires for both toggle directions; enabling both rules viadata.jsondisablestwo-spaces…on next load with a notice.🤖 This change was developed with AI assistance (noted via
Co-Authored-Bytrailers on the commits); a human reviewed and is submitting it. Happy to walk through any part of the design.