MarkdownV2Parser.parse ignores backslash escapes and deviates from the spec in several places

### Problem

`MarkdownV2Parser.parse` (in `gramjs/extensions/markdownv2.ts`) departs from
the [Telegram MarkdownV2 spec](https://core.telegram.org/bots/api#markdownv2-style)
in several ways:

1. **Backslash escapes are not honored.** Per the spec, `\X` for any X in
 ``_*[]()~`>#+-=|{}.!`` becomes the literal X, and `\\` becomes `\`. Today
 these are passed through verbatim:
 - Input `1\.5` → output text `1\.5` (expected `1.5`).
 - Input `\*not bold\*` → output text `not bold` (expected literal
 text `*not bold*` with no entity, since the delimiters are escaped).

2. **Italic uses `-` instead of `_`.** The current code matches `-text-` for
 italic. The spec uses `_text_` and reserves `__text__` for underline.

3. **No blockquote support.** Per spec, lines beginning with `>` form a
 blockquote, and a final line ending in `||` marks it as expandable
 (`MessageEntityBlockquote.collapsed = true`). Today these are emitted as
 literal `>` chars.

4. **Per-region escape rules are not applied.** Inside `pre` and `code` only
 `\\` and `` \` `` should unescape; inside the `(URL)` of a link or custom
 emoji only `\\` and `\)` should unescape. There's no pass to apply this
 selectively.

5. **HTML special characters in plain text confuse the downstream HTML
 parser.** A user-typed `<` is not escaped before being handed to
 `HTMLParser.parse`, so `not bold` typed into MarkdownV2 input is
 incorrectly interpreted as bold.

6. **`HTMLParser` is missing some Telegram HTML-spec tags** that
 `htmlToMarkdownV2` (and external callers) need to round-trip cleanly:
 `<tg-spoiler>`, ``, `<ins>` (underline
 alternative), `<strike>` (strikethrough alternative). And
 `HTMLParser.unparse` emits the library-internal `<spoiler>` tag (not in
 the spec) and drops the `expandable` attribute on collapsed blockquotes,
 so the flag doesn't survive round-trips.

### Proposal

Rewrite the markdown→HTML transform inside the existing
`markdown → HTML → HTMLParser` pipeline as a staged process: extract
protected regions (pre/code/link/emoji) up front with their own escape
rules; mask remaining backslash-escapes; HTML-escape `&` and `<` in user
content; run span and blockquote markup; unmask; restore protected regions.
Switch italic to `_` per spec. Expose `markdownV2ToHtml` and
`htmlToMarkdownV2` as standalone functions so external callers can convert
between formats. Patch `HTMLParser` to recognize the missing tag forms and
to preserve the `expandable` attribute on round-trip.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MarkdownV2Parser.parse ignores backslash escapes and deviates from the spec in several places #830

Problem

Proposal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

MarkdownV2Parser.parse ignores backslash escapes and deviates from the spec in several places #830

Description

Problem

Proposal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions