Skip to content

FR #1359: add "Sentence per line" content rule#1529

Open
counterposition wants to merge 2 commits into
platers:masterfrom
counterposition:feature/sentence-per-line
Open

FR #1359: add "Sentence per line" content rule#1529
counterposition wants to merge 2 commits into
platers:masterfrom
counterposition:feature/sentence-per-line

Conversation

@counterposition

Copy link
Copy Markdown

Why

When Markdown is tracked in git (or any line-based VCS), keeping each sentence on its own source line makes diffs localized: editing one sentence produces a one-line patch instead of re-flowing an entire paragraph. This "semantic line breaks" / "ventilated prose" convention is requested in #1359 ("FR: Put each sentence on its own line"). The single comment on that issue asks for the sentence-terminating characters to be configurable — this PR exposes that (plus an abbreviation list) as options.

Rendering caveat (the first thing a reviewer/user will rightly ask). Under standard Markdown a single newline is a space, so rendered output is unchanged — but Obsidian's "Strict line breaks" setting is off by default, and with it off Obsidian renders a single newline as a visible line break. So for default-configured Obsidian users this rule does change reading-view rendering. This is stated plainly in the rule description, the settings docs, and the rule's additional-info page rather than buried — it's the key "is this safe for me" decision.

Fixes #1359

What it does

Reflows only root-level paragraphs so each sentence is on its own line. Headings, lists, blockquotes, tables, fenced/indented code, math blocks, HTML blocks, footnote definitions, and YAML are left untouched (by AST construction, not string-sniffing). Reflowing prose nested in lists/blockquotes is intentionally out of scope for v1.

Sentence detection is a deterministic, dependency-free, O(n) scanner with a token layer so masked placeholders, inline HTML, and autolinks are fully opaque (never split inside, valid sentence starts). It is precision-first: configurable terminators (ASCII .?! require trailing whitespace; CJK/full-width 。!? and do not), an abbreviation list (e.g., U.S., a.m., …), and suppression of single-letter initials, decimals/versions, and ellipses. Deliberate, documented heuristic limitations (lowercase/digit sentence starts, lone-capital endings, half-width .+CJK, mid-line <br>) are listed in the rule docs.

Safety properties:

  • Masking-aware: operates on the framework's placeholder-masked text; terminator membership is never tested on atom characters, so a user terminator like _/}/. can't corrupt a placeholder on restoration.
  • Idempotent by construction (join is the exact inverse of split); covered by tests that apply the rule twice.
  • Structural dividers: a masked block (table, %%comment%%, custom-ignore, lone link/autolink) adjacent to prose with no blank line is preserved verbatim with its adjacency intact (it is not this rule's job to add blank lines — that's paragraph-blank-lines).
  • Hard breaks ( , <br>, <br/>, \) are preserved and never merged across.
  • Conflict handling mirrors the existing paragraph-blank-linestwo-spaces-between-lines-with-content precedent symmetrically: a confirm-modal on either UI toggle, plus a settings-load guard in main.ts for the data.json/sync path.

Files

New: src/utils/sentence-splitting.ts, src/rules/sentence-per-line.ts, two test suites, docs/additional-info/rules/sentence-per-line.md. Touched: src/utils/mdast.ts (two helpers), src/main.ts (conflict block), src/rules/two-spaces-between-lines-with-content.ts (symmetric wiring), src/lang/locale/en.ts, generated content-rules.md, README rule list.

Notes for the reviewer

  • Followed docs/contributing/adding-a-rule.md: FR exists (FR: Put each sentence on its own line #1359), rule copied from the template, options/examples/locale wired the standard way.
  • The abbreviation option uses the format-yaml-array precedent (optionsKey ≠ config key) so a TextAreaOptionBuilder default/override doesn't hit the .split path.
  • npm run docs regenerated README + content-rules.md; I scoped this PR to only the new rule and intentionally did not sweep in unrelated pre-existing doc-regeneration drift (heading-rules.md and some README anchors) from an earlier source-only change — happy to send that separately if you'd prefer.
  • One known cosmetic: the generator renders a non-empty TextArea default (the abbreviation list) with raw newlines in the options-table cell. This rule is the first with a non-empty TextArea default, so there's no prior art for how you'd like it formatted — glad to adjust the generator or the default presentation per your preference.

Test plan

  • npm run test — full suite green (this branch: 1274 passing, incl. 65 new cases)
  • npm run lint — clean
  • npm run build — clean
  • Manual in a vault: rendering unchanged with "Strict line breaks" on; re-running is a no-op; a table/code/%%comment%%/custom-ignore block adjacent to prose is not corrupted; the conflict modal fires for both toggle directions; enabling both rules via data.json disables two-spaces… on next load with a notice.

🤖 This change was developed with AI assistance (noted via Co-Authored-By trailers on the commits); a human reviewed and is submitting it. Happy to walk through any part of the design.

counterposition and others added 2 commits May 18, 2026 13:44
Splits root-level paragraph prose so each sentence is on its own source
line, making line-based diffs localized to the edited sentence (semantic
line breaks / ventilated prose). Implements platers#1359.

- new dependency-free, atom-opaque sentence splitter (sentence-splitting.ts)
  with a token layer so masked placeholders / inline HTML are never split
  inside; configurable terminators + abbreviation suppression
- reflowParagraphsOneSentencePerLine + detectTrailingLineBreakIndicator in
  mdast.ts: root-child paragraph selection, structural-divider handling so
  masked blocks (tables, %% comments, custom-ignore) adjacent to prose are
  preserved, hard-break (<br>/<br/>/\\/two-space) preservation
- symmetric conflict handling with two-spaces-between-lines-with-content
  (UI modal both directions + settings-load guard in main.ts)
- en.ts locale, additional-info docs (leads with the Obsidian "Strict line
  breaks" rendering caveat), regenerated content-rules.md, README entry
- unit + ruleTest coverage incl. idempotency, B1/B2/B3/B4/M2/M5/M7/M8/P1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adversarial review follow-ups:

- URL/email autolinks are now opaque atoms. `url` masking leaves the
  surrounding `<>` of <https://x> behind as `<{URL_PLACEHOLDER}>`; the
  scanner only saw the inner placeholder, so a sentence after an autolink
  was not split and a lone autolink line was wrongly merged into prose
  (violating the plan's "a single autolink is a divider" invariant). Added
  a CommonMark-autolink alternative (wrapped placeholder, absolute-URI, and
  email forms) to both the token regex and the divider regex.
- Astral sentence terminators (e.g. emoji) are dropped during sanitization
  instead of being accepted but never matched, since the scan compares
  single UTF-16 code units. Documented as a BMP-only limitation.

Added unit + ruleTest coverage for both; full suite 1274 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant