Skip to content

Early return in tlcsv ValidateStructure#659

Merged
irees merged 2 commits into
mainfrom
validate-structure-early-out
Jun 12, 2026
Merged

Early return in tlcsv ValidateStructure#659
irees merged 2 commits into
mainfrom
validate-structure-early-out

Conversation

@irees

@irees irees commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Reader.ValidateStructure only needs each file's header and whether it contains at least one data row, but it previously read every row of every file to determine that — on stop_times.txt or shapes.txt that means scanning tens of millions of rows for nothing. It now reads the header and the first data row via ReadRowsIter and stops, leaving the rest of structure validation (empty-file handling, required-column and duplicate-column checks, the conditional stops.txt handling for flex feeds) unchanged.

Behavior preserved

  • A header-only file is still treated as empty: required files report FileRequiredError, and optional files (such as stops.txt in a flex feed) are skipped without error.
  • A file with at least one data row still runs the column-presence and duplicate-column checks against its header.

Test plan

  • A new TestValidateStructure_RequiresDataRow pins the header-vs-data-row distinction: a header-only required file reports FileRequiredError, while the same file with one data row validates.
  • The existing tlcsv structure-validation tests (flex stops.txt conditionals, required-header-only) still pass, confirming the early return produces the same results as the previous full scan.

@irees irees marked this pull request as ready for review June 12, 2026 10:34
Copilot AI review requested due to automatic review settings June 12, 2026 10:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves GTFS CSV structure validation performance by avoiding full-file scans during Reader.ValidateStructure, while preserving the existing empty-file/required-file semantics, and adds a regression test to ensure header-only vs. data-row behavior remains correct.

Changes:

  • Update Reader.ValidateStructure to read only the header and first data row (via ReadRowsIter) instead of scanning all rows.
  • Add TestValidateStructure_RequiresDataRow to pin the “header-only is empty” vs. “has at least one data row” distinction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tlcsv/reader.go Switches structure validation to early-exit after confirming a header and first data row, avoiding costly full scans.
tlcsv/validate_structure_test.go Adds a focused test ensuring header-only required files still error, while header+data does not.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tlcsv/reader.go
@irees irees merged commit 0fb8df5 into main Jun 12, 2026
6 checks passed
@irees irees deleted the validate-structure-early-out branch June 12, 2026 10:50
@irees irees mentioned this pull request Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants