Skip to content

[fix] indentation & trailing whitespace in Heredoc#945

Open
nicholasdower wants to merge 2 commits into
fables-tales:trunkfrom
nicholasdower:nickd/heredoc-whitespace
Open

[fix] indentation & trailing whitespace in Heredoc#945
nicholasdower wants to merge 2 commits into
fables-tales:trunkfrom
nicholasdower:nickd/heredoc-whitespace

Conversation

@nicholasdower

Copy link
Copy Markdown
Contributor

An attempt at fixing a few Heredoc issues (including #921).

See: https://docs.ruby-lang.org/en/master/syntax/literals_rdoc.html#here-document-literals

The indentation of the least-indented line will be removed from each
line of the content. Note that empty lines and lines consisting solely
of literal tabs and spaces will be ignored for the purposes of
determining indentation, but escaped tabs and spaces are considered
non-indentation characters.

For the purpose of measuring an indentation, a horizontal tab is
regarded as a sequence of one to eight spaces such that the column
position corresponding to its end is a multiple of eight. The amount to
be removed is counted in terms of the number of spaces. If the boundary
appears in the middle of a tab, that tab is not removed.

Issues Addressed

To run the following tests locally, first run:

git fetch https://github.com/nicholasdower/rubyfmt-fork.git nickd/heredoc-whitespace
git switch --detach FETCH_HEAD
cargo build --release

1 tab equals 8 spaces

The first line is indented 8 spaces. The second is indented 1 tab. The indentation is equal.

Test

code='puts <<~FOO.inspect
        foo
	bar
FOO'

echo "$code" | sed -n l && echo
printf 'original: %s\n' "$(ruby -e "$code")"
printf 'trunk:    %s\n' "$(echo "$code" | rubyfmt | ruby)"
printf 'branch:   %s\n' "$(echo "$code" | ./target/release/rubyfmt-main | ruby)"

Result

puts <<~FOO.inspect$
        foo$
\tbar$
FOO$

original: "foo\nbar\n"
trunk:    "foo\n\tbar\n"
branch:   "foo\nbar\n"

1 space + 1 tab equals 8 spaces

The first line is indented 9 spaces. The second is indented 1 space and 1 tab. This should leave 1 space before "foo".

Test

code='puts <<~FOO.inspect
         foo
 	bar
FOO'

echo "$code" | sed -n l && echo
printf 'original: %s\n' "$(ruby -e "$code")"
printf 'trunk:    %s\n' "$(echo "$code" | rubyfmt | ruby)"
printf 'branch:   %s\n' "$(echo "$code" | ./target/release/rubyfmt-main | ruby)"

Result

puts <<~FOO.inspect$
         foo$
 \tbar$
FOO$

original: " foo\nbar\n"
trunk:    "foo\n\tbar\n"
branch:   " foo\nbar\n"

Trailing whitespace in squiggly Heredoc should not be stripped

Test

code='puts <<~FOO.inspect
  foo 
  bar	
   
  	
FOO'

echo "$code" | sed -n l && echo
printf 'original: %s\n' "$(ruby -e "$code")"
printf 'trunk:    %s\n' "$(echo "$code" | rubyfmt | ruby)"
printf 'branch:   %s\n' "$(echo "$code" | ./target/release/rubyfmt-main | ruby)"

Result

puts <<~FOO.inspect$
  foo $
  bar\t$
   $
  \t$
FOO$

original: "foo \nbar\t\n \n\t\n"
trunk:    "foo\nbar\n\n\n"
branch:   "foo \nbar\t\n \n\t\n"

A single whitespace-only line should be preserved

Test

code='puts <<~FOO.inspect
 
FOO'

echo "$code" | sed -n l && echo
printf 'original: %s\n' "$(ruby -e "$code")"
printf 'trunk:    %s\n' "$(echo "$code" | rubyfmt | ruby)"
printf 'branch:   %s\n' "$(echo "$code" | ./target/release/rubyfmt-main | ruby)"

Result

puts <<~FOO.inspect$
 $
FOO$

original: " \n"
trunk:    ""
branch:   " \n"

Multiple whitespace-only lines should be preserved

Test

code='puts <<~FOO.inspect
 
 
FOO'

echo "$code" | sed -n l && echo
printf 'original: %s\n' "$(ruby -e "$code")"
printf 'trunk:    %s\n' "$(echo "$code" | rubyfmt | ruby)"
printf 'branch:   %s\n' "$(echo "$code" | ./target/release/rubyfmt-main | ruby)"

Result

puts <<~FOO.inspect$
 $
 $
FOO$

original: " \n \n"

thread 'main' (14594026) panicked at librubyfmt/src/line_tokens.rs:203:17:
shouldn't ever have a single newline direct part
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
trunk:
branch:   " \n \n"

Issues Not Yet Addressed

Heredoc containing only an empty line

Test

code='puts <<~FOO.inspect

FOO'

echo "$code" | sed -n l && echo
printf 'original: %s\n' "$(ruby -e "$code")"
printf 'trunk:    %s\n' "$(echo "$code" | rubyfmt | ruby)"
printf 'branch:   %s\n' "$(echo "$code" | ./target/release/rubyfmt-main | ruby)"

Result

puts <<~FOO.inspect$
$
FOO$

original: "\n"
trunk:    ""
branch:   ""

Heredoc containing only multiple empty lines

Test

code='puts <<-FOO.inspect


FOO'

echo "$code" | sed -n l && echo
printf 'original: %s\n' "$(ruby -e "$code")"
printf 'trunk:    %s\n' "$(echo "$code" | rubyfmt | ruby)"
printf 'branch:   %s\n' "$(echo "$code" | ./target/release/rubyfmt-main | ruby)"

Result

puts <<-FOO.inspect$
$
$
FOO$

original: "\n\n"

thread 'main' (14697550) panicked at librubyfmt/src/line_tokens.rs:203:17:
shouldn't ever have a single newline direct part
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
trunk:

thread 'main' (14697558) panicked at librubyfmt/src/line_tokens.rs:203:17:
shouldn't ever have a single newline direct part
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
branch:

An attempt at fixing a few Heredoc issues:
* 1 tab equals 8 spaces
* 1 space + 1 tab equals 8 spaces
* Trailing whitespace in squiggly Heredoc should not be stripped.
* A single whitespace-only line should be preserved
* Multiple whitespace-only lines should be preserved

See:
https://docs.ruby-lang.org/en/master/syntax/literals_rdoc.html#here-document-literals

>The indentation of the least-indented line will be removed from each
>line of the content. Note that empty lines and lines consisting solely
>of literal tabs and spaces will be ignored for the purposes of
>determining indentation, but escaped tabs and spaces are considered
>non-indentation characters.
>
>For the purpose of measuring an indentation, a horizontal tab is
>regarded as a sequence of one to eight spaces such that the column
>position corresponding to its end is a multiple of eight. The amount to
>be removed is counted in terms of the number of spaces. If the boundary
>appears in the middle of a tab, that tab is not removed.
@nicholasdower nicholasdower force-pushed the nickd/heredoc-whitespace branch 3 times, most recently from 63959ad to 7153043 Compare June 14, 2026 07:55
@nicholasdower

nicholasdower commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

In my original commit, I added test cases to existing fixtures. But this had a few downsides:

  • Since I care about the formatting and the Ruby output, I needed to write the same test in two places: ci/string_literals_stress_test.rb and fixtures/small/heredoc_indented_whitespace_actual.rb.
  • Dealing with actual and expected files was cumbersome.
  • It was hard to see tabs and trailing whitespace.
  • Since the fixtures contain multiple test cases, it was hard to tell which test was actually failing.

I've now added a commit that introduces tests/string_test.rs. It is similar to tests/stress_test.rs in that it compares the output of ruby before and after formatting. But it also verifies the output of rubyfmt and uses strings rather than input files.

Please let me know whether this is an acceptable approach.

Also note that I had to bump the Ruby version in CI to 3.4. Prior to 3.4, there was a bug related to trailing whitespace in heredocs:

echo "puts <<~FOO.inspect\n \nFOO\n" | /Users/nickdower/.rvm/rubies/ruby-3.3.0/bin/ruby
"\n"
echo "puts <<~FOO.inspect\n \nFOO\n" | /Users/nickdower/.rvm/rubies/ruby-3.4.1/bin/ruby
" \n"

The issue was fixed in ruby/ruby@5bb656e4.

@nicholasdower nicholasdower force-pushed the nickd/heredoc-whitespace branch from 7153043 to 04d41ff Compare June 14, 2026 08:55
@nicholasdower nicholasdower force-pushed the nickd/heredoc-whitespace branch from 04d41ff to 8ce11e3 Compare June 14, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant