Skip to content

Introduce line breaking in format command#486

Draft
sorenhartmann wants to merge 15 commits into
PlasmaFAIR:format-commandfrom
sorenhartmann:line-break
Draft

Introduce line breaking in format command#486
sorenhartmann wants to merge 15 commits into
PlasmaFAIR:format-commandfrom
sorenhartmann:line-break

Conversation

@sorenhartmann

Copy link
Copy Markdown

WIP

@ZedThree

Copy link
Copy Markdown
Member

Thanks @sorenhartmann! You probably want to rebase on top of the most recent format-command, I've moved things about

@sorenhartmann

sorenhartmann commented Aug 13, 2025

Copy link
Copy Markdown
Author

Alright so this is a (perhaps crude) implementation of a line wrapping algorithm for Fortran. In large part, this is inspired by how black / ruff seems to go about deciding when and where to break different language nodes. In particular, being as consistent as possible and breaking outer scopes before inner scopes.

The basic approach is as follows

  1. Use tree-sitter queries to identify syntax nodes that can be broken across multiple lines - and how these breaks would occur.
  2. Walk the syntax tree and identify the outermost syntax node(s) containing long lines that can be broken.
  3. Break each of these nodes
  4. Repeat until either all lines obey the line_length constraint, or they can not be broken any further

I noticed that this approach is quite close to what topiary is already doing, however as far as I could figure out, applying new lines conditional on line length and syntax tree level is still not really something topiary does.

Some notes

  • Syntax nodes of identical type inherit breaks from direct children. This is to allow e.g. math_expressions to be broken in the same scope, but not if they are contained in e.g. parenthesis expressions.
  • I record and apply new indentation when breaking lines, however this is mainly due to figuring out when to stop breaking lines since indentation is immediately stripped by topiary. This also means that the indentation semantics should also be represented by the topiary query
  • I strip all existing line continuations before breaking new lines. I believe this is necessary, both due to consistency and since we would want to re-flow previously wrapped lines (or at least I would).
  • The particulars of what and how statements are broken should be purely represented by the wrap.scm query file - so this should be quite easy to adjust / extend
  • Being relatively new to rust, the implementation is perhaps not the most efficient (but I also don't want to spend to much time doing premature optimization)

Finally, as an initial implementation I have the following pipeline set up

  1. Split new lines
  2. Format contents
  3. Split lines
  4. If any lines has been split in (3.) repeat from (2.)

@ZedThree ZedThree left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sorenhartmann! This looks great, it's really appreciated!

I have some high level comments:

  • it would be good to get some unit tests on some of the individual functions -- I'm happy to lend a hand with this
  • after fixing writing the output, I notice there are some test failures: it looks like a line break is being forced in cases where one isn't necessary? For example, this snippet:
-  function real_not(a)
+  function real_not( &
+    a)

Ahh, this is because the line length is currently hardcoded to 20, with an indent of 2; changing this to 80 and all the tests pass -- ok, so this is just a matter of needing to handle user options. Again, something I'm happy to help with

  • we'll have to handle the different newline styles -- \n/\r\n/\r, but we can put this off for now
  • this "black-esque" style, with the vertical wrapping, is not going to be to everyone's tastes -- we've had a fair bit of discussion about this. I do like it, because it basically requires no options, and so there can be no arguments about tuning it really. However, we'll likely end up having this as one of two options, with the other being a flatter wrapping style -- just something to bear in mind

Also, a couple of points to help with developing rust:

  • it's good to run cargo clippy fairly frequently (or even better, set up LSP or something in your editor to run it automatically). This is the rust linter, and it's fantastic, it's really helpful for learning rust.
  • README.dev.md describes how to run the tests

Comment thread crates/fortitude_formatter/src/lib.rs Outdated
)
}

if strings.last().unwrap().ends_with('&') {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle comments after a &, but maybe that's correct. Presumably if we see something like:

foo = bar & ! comment
  + zag

we only want to indent + zag correctly, but leave the line break and comment alone, otherwise we have to work out where to put the comment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit, I did not consider this edge case, but it seems to me it is handled somewhat reasonably

As far as I can tell, indentation should in general be handled by the topiary query, since indentation is stripped prior to formatting, with no way to preserve it.

Currently the line breaking logic does not allow the breaking of any nodes spanning more than on line, so introducing a comment, would effectively disable line breaking across the entire math expression, i.e.

foo = abc + bar & ! comment
  + zag + def

would break to

foo = abc + bar & ! comment
  + zag + def

and not

foo = abc &
  + bar & ! comment
  + zag &
  + def

Comment thread crates/fortitude_formatter/src/lib.rs Outdated
Comment thread fortitude/resources/test/fixtures/format/long-lines.f90 Outdated
Comment thread crates/fortitude_formatter/src/lib.rs Outdated
ZedThree and others added 4 commits August 18, 2025 14:44
These fit into the `Configuration` framework. Some work still needs to
be done, namely that `Settings` is currently living in the linter
crate, when it should be in the workspace crate.
@ZedThree

Copy link
Copy Markdown
Member

I've wired up the settings to this, and added two tests for your new example: one with the default line length (100), and one with a shorter one (20)

@LiamPattinson

Copy link
Copy Markdown
Collaborator

I think the Clippy issue can be fixed by rebasing on the latest main

@sorenhartmann

Copy link
Copy Markdown
Author

@ZedThree @LiamPattinson Thank you for the feedback, it is really appreciated!

First of all, I apologize for the late reply.

  • I am not really used to the GitHub workflow, but I have tried to respond to your comments above. Also, for rebasing, should I just force-push this branch?
  • I have updated the topiary queries to handle some more constructs, and also handle some indentation across line-breaks.
  • As far as I understand, according to Formatting mode #3, ruff may also be able to handle formatting / line breaking to some extend? This seems very interesting! However, maybe we/I should wait with further development on this PR until the scope of a ruff solution is more clear?

@ZedThree

ZedThree commented Sep 2, 2025

Copy link
Copy Markdown
Member

Thanks @sorenhartmann! Seems like the clippy issue is resolved, so no need to rebase.

We've got a session at a workshop next week, so I might see if we can get some people to try out this branch and get some real world feedback.

As far as I understand, according to #3, ruff may also be able to handle formatting / line breaking to some extend? This seems very interesting! However, maybe we/I should wait with further development on this PR until the scope of a ruff solution is more clear?

That's probably wise, I would really very much hate to waste your time! One thing about using ruff's solution would be that we can take advantage of any improvements they make very easily, but it's possible that it's a lot of work to use it in the first place.

I'm also not sure how flexible their solution is. If it's a bunch of work to achieve the same results you've got here, then it is probably not worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants