Component Selection
Describe the Bug
The native Parquet reader can fail when reading non-top-level repeated columns with chunk-level repetition/definition levels.
The failure happens when the value reader advances to a later physical data page before the corresponding repeated/definition-level page metadata has been materialized into numLeavesInPage_.
In this state, PageReader::setPageRowInfo() increments pageIndex_ and checks that the page index is already covered by numLeavesInPage_:
BOLT_CHECK_LT(
pageIndex_,
numLeavesInPage_.size(),
"Seeking past known repdefs for non top level column page {}",
pageIndex_);
However, this state can be recoverable. More rep/def metadata may already be staged in preloadedRepDefs_, but not yet decoded into numLeavesInPage_.
As a result, sparse/selective reads over repeated Parquet columns can throw:
Seeking past known repdefs for non top level column page N even though the reader already has pending rep/def batches available and could continue by materializing them.
Reproduction Steps
A focused unit test can reproduce the failure by constructing the relevant PageReader state directly:
- Create a non-top-level repeated leaf
PageReader.
- Set chunk-level rep/defs state:
hasChunkRepDefs_ = true
pageIndex_ = 0
numLeavesInPage_ = {1}
- Add one pending rep/def batch to
preloadedRepDefs_.
- Call
setPageRowInfo(false).
Without the fix, setPageRowInfo(false) increments pageIndex_ to 1, sees that numLeavesInPage_.size() is still 1, and fails with:
(1 vs. 1) Seeking past known repdefs for non top level column page 1
The regression test could be added for this case is:
TEST_F(ParquetPageReaderTest, loadsPendingRepDefsBeforePageRowInfoCheck)
The issue can also be triggered by sparse/selective reads over repeated Parquet columns, especially when rep/def decoding is batched and the value reader reaches a later physical page while more rep/def metadata
remains pending in preloadedRepDefs_.
Targeted verification command:
cmake --build _build/Release --target bolt_dwio_parquet_reader_test
_build/Release/bolt/dwio/parquet/tests/reader/bolt_dwio_parquet_reader_test
--gtest_filter=ParquetPageReaderTest.loadsPendingRepDefsBeforePageRowInfoCheck
Observed result without the fix:
[ FAILED ] ParquetPageReaderTest.loadsPendingRepDefsBeforePageRowInfoCheck
Reason: (1 vs. 1) Seeking past known repdefs for non top level column page 1
Expression: pageIndex_ < numLeavesInPage_.size()
Function: setPageRowInfo
File: bolt/dwio/parquet/reader/PageReader.cpp
Bolt Version / Commit ID
main
System Configuration
- **OS**: (e.g. Ubuntu 22.04, CentOS 7)
- **Compiler**: (e.g. GCC 11, Clang 14)
- **Build Type**: (Debug / Release / RelWithDebInfo)
- **CPU Arch**: (e.g. x86_64 AVX2, ARM64)
- **Framework**: (e.g. Spark 3.3, PrestoDB)
Logs / Stack Trace
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (5 vs. 5) Seeking past known repdefs for non top level column page 5
Retriable: False
Expression: pageIndex_ < numLeavesInPage_.size()
Additional Context: Operator: TableScan[0] 0
Function: setPageRowInfo
File: bolt/dwio/parquet/reader/PageReader.cpp
Line: 265
Expected Behavior
When PageReader::setPageRowInfo() advances to a non-top-level repeated-column page whose metadata is not yet present in numLeavesInPage_, it should first check whether additional rep/def batches are pending
in preloadedRepDefs_.
If pending batches exist, the reader should materialize them by calling loadMoreRepDefs() before enforcing the bounds check.
The existing safety check should still remain in place. If no pending rep/def batches exist and pageIndex_ is still beyond numLeavesInPage_, the reader should continue to fail as before because that indicates
a true invalid seek or corrupted/inconsistent state.
Additional context
Root cause hypothesis:
numLeavesInPage_ tracks the number of leaf values decoded from rep/def metadata for each data page. With batched rep/def decoding, this vector may lag behind the physical data page reached by the value reader.
Sparse/selective scans can advance the value path across page boundaries while only a subset of rep/def page metadata has been decoded into numLeavesInPage_.
The important detail is that the missing metadata may already be available in preloadedRepDefs_. In that case, failing immediately is too strict. The reader should lazily decode pending rep/def batches until
either:
pageIndex_ < numLeavesInPage_.size(), or
- there are no pending rep/def batches left.
The fix is small and preserves the original invariant:
while (pageIndex_ >= static_cast<int32_t>(numLeavesInPage_.size()) &&
!preloadedRepDefs_.empty()) {
loadMoreRepDefs();
}
BOLT_CHECK_LT(
pageIndex_,
numLeavesInPage_.size(),
"Seeking past known repdefs for non top level column page {}",
pageIndex_);
Validation:
• The focused regression test fails without the fix with the expected Seeking past known repdefs error.
• The same test passes with the fix.
• The fix only materializes already-preloaded rep/def batches and does not suppress the existing bounds check.
Component Selection
Describe the Bug
The native Parquet reader can fail when reading non-top-level repeated columns with chunk-level repetition/definition levels.
The failure happens when the value reader advances to a later physical data page before the corresponding repeated/definition-level page metadata has been materialized into
numLeavesInPage_.In this state,
PageReader::setPageRowInfo()incrementspageIndex_and checks that the page index is already covered bynumLeavesInPage_:However, this state can be recoverable. More rep/def metadata may already be staged in
preloadedRepDefs_, but not yet decoded intonumLeavesInPage_.As a result, sparse/selective reads over repeated Parquet columns can throw:
Seeking past known repdefs for non top level column page N even though the reader already has pending rep/def batches available and could continue by materializing them.
Reproduction Steps
A focused unit test can reproduce the failure by constructing the relevant
PageReaderstate directly:PageReader.hasChunkRepDefs_ = truepageIndex_ = 0numLeavesInPage_ = {1}preloadedRepDefs_.setPageRowInfo(false).Without the fix,
setPageRowInfo(false)incrementspageIndex_to1, sees thatnumLeavesInPage_.size()is still1, and fails with:(1 vs. 1) Seeking past known repdefs for non top level column page 1
The regression test could be added for this case is:
TEST_F(ParquetPageReaderTest, loadsPendingRepDefsBeforePageRowInfoCheck)
The issue can also be triggered by sparse/selective reads over repeated Parquet columns, especially when rep/def decoding is batched and the value reader reaches a later physical page while more rep/def metadata
remains pending in preloadedRepDefs_.
Targeted verification command:
cmake --build _build/Release --target bolt_dwio_parquet_reader_test
_build/Release/bolt/dwio/parquet/tests/reader/bolt_dwio_parquet_reader_test
--gtest_filter=ParquetPageReaderTest.loadsPendingRepDefsBeforePageRowInfoCheck
Observed result without the fix:
[ FAILED ] ParquetPageReaderTest.loadsPendingRepDefsBeforePageRowInfoCheck
Reason: (1 vs. 1) Seeking past known repdefs for non top level column page 1
Expression: pageIndex_ < numLeavesInPage_.size()
Function: setPageRowInfo
File: bolt/dwio/parquet/reader/PageReader.cpp
Bolt Version / Commit ID
main
System Configuration
Logs / Stack Trace
Expected Behavior
When
PageReader::setPageRowInfo()advances to a non-top-level repeated-column page whose metadata is not yet present innumLeavesInPage_, it should first check whether additional rep/def batches are pendingin
preloadedRepDefs_.If pending batches exist, the reader should materialize them by calling
loadMoreRepDefs()before enforcing the bounds check.The existing safety check should still remain in place. If no pending rep/def batches exist and
pageIndex_is still beyondnumLeavesInPage_, the reader should continue to fail as before because that indicatesa true invalid seek or corrupted/inconsistent state.
Additional context
Root cause hypothesis:
numLeavesInPage_tracks the number of leaf values decoded from rep/def metadata for each data page. With batched rep/def decoding, this vector may lag behind the physical data page reached by the value reader.Sparse/selective scans can advance the value path across page boundaries while only a subset of rep/def page metadata has been decoded into
numLeavesInPage_.The important detail is that the missing metadata may already be available in
preloadedRepDefs_. In that case, failing immediately is too strict. The reader should lazily decode pending rep/def batches untileither:
pageIndex_ < numLeavesInPage_.size(), orThe fix is small and preserves the original invariant:
Validation:
• The focused regression test fails without the fix with the expected Seeking past known repdefs error.
• The same test passes with the fix.
• The fix only materializes already-preloaded rep/def batches and does not suppress the existing bounds check.