Skip to content

fix: skip UTF-8 BOM when processing front matter#2751

Open
StressTestor wants to merge 1 commit into
mikefarah:masterfrom
StressTestor:fix/2496-front-matter-bom
Open

fix: skip UTF-8 BOM when processing front matter#2751
StressTestor wants to merge 1 commit into
mikefarah:masterfrom
StressTestor:fix/2496-front-matter-bom

Conversation

@StressTestor

Copy link
Copy Markdown

Disclosure: this PR was written by an AI agent (Claude Code) acting on the user's behalf, not the user personally.

what

yq --front-matter=process drops the opening --- when the input file starts with a UTF-8 BOM, which loses data:

$ printf '\xEF\xBB\xBF---\ntitle: Test\n---\n' > test.md
$ yq --front-matter=process test.md
title: Test
---

The opening --- is gone. This is #2496.

why

front_matter.go's Split() wraps the file in a bufio.Reader without skipping a leading BOM. The first line read is then \xEF\xBB\xBF---, so the --- separator is not recognised and that line is treated as content instead of the front-matter delimiter.

fix

Skip the BOM with utfbom.Skip before wrapping the reader, the same way decoder_csv_object.go already does. utfbom is already a direct dependency.

cleanReader, _ := utfbom.Skip(file)
reader = bufio.NewReader(cleanReader)

Applied to both the file and stdin paths. utfbom.Skip replays non-BOM bytes unchanged, so files without a BOM are byte-identical.

tests

TestFrontMatterSplitWithBOM in front_matter_test.go feeds a BOM-prefixed document and asserts the front matter and content split correctly. It fails before the fix (the front matter starts with the BOM) and passes after.

Fixes #2496.

--front-matter=process wrapped the file in a bufio.Reader without
skipping a leading UTF-8 BOM, so a BOM before the opening --- meant the
separator was not recognised and the opening --- was dropped from the
output (data loss). Skip the BOM with utfbom.Skip before wrapping the
reader, matching the CSV object decoder.
@mikefarah

Copy link
Copy Markdown
Owner

@cursoragent review this

Comment thread pkg/yqlib/front_matter.go
"io"
"os"

"github.com/dimchansky/utfbom"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find strange to add such dependency to skip UTF BOM.

The repository is stale (which is OK because of the feature stability), but as you use only Skip method.

Is there a need in yq to handle all UTF BOM. Isn't UTF-8 enough?

var utf8BOM = []byte{0xEF, 0xBB, 0xBF}

// StripUTF8BOM returns a reader that skips a leading UTF-8 BOM if present.
func StripUTF8BOM(r io.Reader) io.Reader {
	br := bufio.NewReader(r)

	peek, err := br.Peek(3)
	if err == nil && bytes.Equal(peek, utf8BOM) {
		_, _ = br.Discard(3)
	}

	return br
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think @mikefarah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Process front matter of UTF-8 files with BOM doesn't work.

3 participants