Skip to content

Draft: Catala command to create a pandoc markdown#1024

Draft
Arnaud-Bihan wants to merge 3 commits into
CatalaLang:masterfrom
functori:arnaud@functori@literate
Draft

Draft: Catala command to create a pandoc markdown#1024
Arnaud-Bihan wants to merge 3 commits into
CatalaLang:masterfrom
functori:arnaud@functori@literate

Conversation

@Arnaud-Bihan

Copy link
Copy Markdown
Contributor

A new catala command to create a pandoc-markdown, from that pandoc markdown you can use pandoc to generate any kind of file pandoc supports. To give an example I've also created a backend to generate a pdf . Markdown html and pdf files are generated but there are still some issues (for example pdf links are not working properly while html and markdown are working fine)

@AltGr

AltGr commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Awesome! Once this is mature enough we can backport the html and latex commands to just be sugar for a call of pandoc over it.

@denismerigoux denismerigoux left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!! I guess now we' could lift these commands to clerk so that we can produce a .md or .tex or .html file per target ;)

Comment thread compiler/literate/md.ml

let rec fmt_toc
(parents_headings : string list)
fmt

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here's a long-standing Markdown feature we currently do not support: whatv about really long titles that need to be split into multiple lines like:

## Such a very long title we need 
## two lines to write it

Right now Catala parses it as two different titles of the same level. I don't think it's in the scope of this PR to support this but this gives you an idea of where we would want to go in terms of functionnalities. The pain point to support this feature for instance is that heading parsing currently goes through the Catala parser, so we would need to tweak the Catala parser just for that small feature of Markdown, while we ignore in the Catala parser most of the other Markdown features (because we delegate them to Pandoc later).

Comment thread compiler/literate/md.mli
Comment on lines +2 to +3
and social benefits computation rules. Copyright (C) 2020 Inria, contributor:
Denis Merigoux <denis.merigoux@inria.fr>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change the name and date here for something more accurate like :

Copyright (C) 2026 Inria, contributor: Arnaud Bihan <XXXXX>

Comment thread compiler/driver.ml
$ Cli.Flags.wrap_weaved_output
$ Cli.Flags.extra_files)

let pdf options output print_only_law wrap_weaved_output =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice but the PDF generated by Pandoc is very ugly no? Right now our pipeline of choice to generate PDF is through LaTeX with the preamble set out in

https://github.com/CatalaLang/catala/blob/master/compiler/literate/latex.ml#L52-L165

I would prefer not to roll out an out-of-the-box PDF feature that provides an uglier PDF than the one we're already providing with our current preferred pipeline.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right as of right now the feature is far from being completed. That command was just to give an example of what was possible and also to test the pdf result with a single command instead of two (generating the pandoc markdown and then call pandoc targetting pdf)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmh after checking I may have introduced something wrong in the last version because indeed the pdf is not working as expected, it was working way better before now there are a lot of {...} that should not appear

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, did not know pandoc supported that,!!

@AltGr

AltGr commented May 7, 2026

Copy link
Copy Markdown
Contributor

While your added syntax highlighting is pretty neat, I am a bit wary of having yet another syntax coloring spec in yet another format (I am losing count of how many we have :/)

An option could be to try and leverage the fancy highlighting based on the tree-sitter syntax that we already use for the catala-book. One possible way to do this would be to preprocess all catala code blocks with something like tree-sitter highlight --html --scope catala-code-en and insert the HTML blocks directly in the intermediate markdown. I don't think there exists yet a way to insert this highlighting directly in pandoc (unless of course we write our own filter)

  • upsides:
    • better coloring
    • less maintenance burden for the rules
  • downsides:
    • we depend on the tree-sitter binary in addition to pandoc. it's nothing in size compared to texlive, but still one thing to worry about in our installation (note that catala-format already has this dependency)
    • the intermediate markdown is less readable (but that may not be a problem, if you want readable, you have the raw sources on one side and the html/pdf on the other)

@Arnaud-Bihan Arnaud-Bihan force-pushed the arnaud@functori@literate branch from e2562d4 to d946a54 Compare June 5, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants