fix: skip UTF-8 BOM when processing front matter#2751
Open
StressTestor wants to merge 1 commit into
Open
Conversation
--front-matter=process wrapped the file in a bufio.Reader without skipping a leading UTF-8 BOM, so a BOM before the opening --- meant the separator was not recognised and the opening --- was dropped from the output (data loss). Skip the BOM with utfbom.Skip before wrapping the reader, matching the CSV object decoder.
Owner
|
@cursoragent review this |
ccoVeille
reviewed
Jun 25, 2026
| "io" | ||
| "os" | ||
|
|
||
| "github.com/dimchansky/utfbom" |
Contributor
There was a problem hiding this comment.
I find strange to add such dependency to skip UTF BOM.
The repository is stale (which is OK because of the feature stability), but as you use only Skip method.
Is there a need in yq to handle all UTF BOM. Isn't UTF-8 enough?
var utf8BOM = []byte{0xEF, 0xBB, 0xBF}
// StripUTF8BOM returns a reader that skips a leading UTF-8 BOM if present.
func StripUTF8BOM(r io.Reader) io.Reader {
br := bufio.NewReader(r)
peek, err := br.Peek(3)
if err == nil && bytes.Equal(peek, utf8BOM) {
_, _ = br.Discard(3)
}
return br
}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
what
yq --front-matter=processdrops the opening---when the input file starts with a UTF-8 BOM, which loses data:The opening
---is gone. This is #2496.why
front_matter.go'sSplit()wraps the file in abufio.Readerwithout skipping a leading BOM. The first line read is then\xEF\xBB\xBF---, so the---separator is not recognised and that line is treated as content instead of the front-matter delimiter.fix
Skip the BOM with
utfbom.Skipbefore wrapping the reader, the same waydecoder_csv_object.goalready does.utfbomis already a direct dependency.Applied to both the file and stdin paths.
utfbom.Skipreplays non-BOM bytes unchanged, so files without a BOM are byte-identical.tests
TestFrontMatterSplitWithBOMinfront_matter_test.gofeeds a BOM-prefixed document and asserts the front matter and content split correctly. It fails before the fix (the front matter starts with the BOM) and passes after.Fixes #2496.