Problem
MarkdownV2Parser.parse (in gramjs/extensions/markdownv2.ts) departs from
the Telegram MarkdownV2 spec
in several ways:
-
Backslash escapes are not honored. Per the spec, \X for any X in
_*[]()~`>#+-=|{}.! becomes the literal X, and \\ becomes \. Today
these are passed through verbatim:
- Input
1\.5 → output text 1\.5 (expected 1.5).
- Input
\*not bold\* → output text <b>not bold</b> (expected literal
text *not bold* with no entity, since the delimiters are escaped).
-
Italic uses - instead of _. The current code matches -text- for
italic. The spec uses _text_ and reserves __text__ for underline.
-
No blockquote support. Per spec, lines beginning with > form a
blockquote, and a final line ending in || marks it as expandable
(MessageEntityBlockquote.collapsed = true). Today these are emitted as
literal > chars.
-
Per-region escape rules are not applied. Inside pre and code only
\\ and \` should unescape; inside the (URL) of a link or custom
emoji only \\ and \) should unescape. There's no pass to apply this
selectively.
-
HTML special characters in plain text confuse the downstream HTML
parser. A user-typed < is not escaped before being handed to
HTMLParser.parse, so <b>not bold</b> typed into MarkdownV2 input is
incorrectly interpreted as bold.
-
HTMLParser is missing some Telegram HTML-spec tags that
htmlToMarkdownV2 (and external callers) need to round-trip cleanly:
<tg-spoiler>, <span class="tg-spoiler">, <ins> (underline
alternative), <strike> (strikethrough alternative). And
HTMLParser.unparse emits the library-internal <spoiler> tag (not in
the spec) and drops the expandable attribute on collapsed blockquotes,
so the flag doesn't survive round-trips.
Proposal
Rewrite the markdown→HTML transform inside the existing
markdown → HTML → HTMLParser pipeline as a staged process: extract
protected regions (pre/code/link/emoji) up front with their own escape
rules; mask remaining backslash-escapes; HTML-escape & and < in user
content; run span and blockquote markup; unmask; restore protected regions.
Switch italic to _ per spec. Expose markdownV2ToHtml and
htmlToMarkdownV2 as standalone functions so external callers can convert
between formats. Patch HTMLParser to recognize the missing tag forms and
to preserve the expandable attribute on round-trip.
Problem
MarkdownV2Parser.parse(ingramjs/extensions/markdownv2.ts) departs fromthe Telegram MarkdownV2 spec
in several ways:
Backslash escapes are not honored. Per the spec,
\Xfor any X in_*[]()~`>#+-=|{}.!becomes the literal X, and\\becomes\. Todaythese are passed through verbatim:
1\.5→ output text1\.5(expected1.5).\*not bold\*→ output text<b>not bold</b>(expected literaltext
*not bold*with no entity, since the delimiters are escaped).Italic uses
-instead of_. The current code matches-text-foritalic. The spec uses
_text_and reserves__text__for underline.No blockquote support. Per spec, lines beginning with
>form ablockquote, and a final line ending in
||marks it as expandable(
MessageEntityBlockquote.collapsed = true). Today these are emitted asliteral
>chars.Per-region escape rules are not applied. Inside
preandcodeonly\\and\`should unescape; inside the(URL)of a link or customemoji only
\\and\)should unescape. There's no pass to apply thisselectively.
HTML special characters in plain text confuse the downstream HTML
parser. A user-typed
<is not escaped before being handed toHTMLParser.parse, so<b>not bold</b>typed into MarkdownV2 input isincorrectly interpreted as bold.
HTMLParseris missing some Telegram HTML-spec tags thathtmlToMarkdownV2(and external callers) need to round-trip cleanly:<tg-spoiler>,<span class="tg-spoiler">,<ins>(underlinealternative),
<strike>(strikethrough alternative). AndHTMLParser.unparseemits the library-internal<spoiler>tag (not inthe spec) and drops the
expandableattribute on collapsed blockquotes,so the flag doesn't survive round-trips.
Proposal
Rewrite the markdown→HTML transform inside the existing
markdown → HTML → HTMLParserpipeline as a staged process: extractprotected regions (pre/code/link/emoji) up front with their own escape
rules; mask remaining backslash-escapes; HTML-escape
&and<in usercontent; run span and blockquote markup; unmask; restore protected regions.
Switch italic to
_per spec. ExposemarkdownV2ToHtmlandhtmlToMarkdownV2as standalone functions so external callers can convertbetween formats. Patch
HTMLParserto recognize the missing tag forms andto preserve the
expandableattribute on round-trip.