feat: Implement se/dser method for HashTable#2160
Conversation
|
alchemy merge |
|
alchemy link ce74ff8 |
|
Added new rebase item:
|
|
Failed to cherry-pick commit ce74ff8 in rebase request #2165: Please:
|
|
alchemy merge @2026-06-19T11:32:13Z |
|
alchemy link ce74ff8 @2026-06-19T11:32:13Z |
|
The following unexpired item was removed at
Added new rebase item:
|
|
Failed to cherry-pick commit ce74ff8 in rebase request #2168: Please:
|
ce74ff8 to
2ca1ac8
Compare
|
alchemy merge @2026-06-20T05:43:02Z |
|
alchemy link 2ca1ac8 @2026-06-20T05:43:02Z |
|
The following unexpired item was removed at
Added new rebase item:
|
|
Failed to cherry-pick commit 2ca1ac8 in rebase request #2174: Please:
|
2ca1ac8 to
5624f6e
Compare
|
alchemy merge @2026-06-23T06:49:46Z |
|
alchemy link 5624f6e @2026-06-23T06:49:46Z |
|
The following unexpired item was removed at
Added new rebase item:
|
Problem
HashTablecurrently does not support native serialization/deserialization.To enable driver-side broadcast hash join in Gluten (similar to Spark), we need to serialize the pre-built hash table on the driver and deserialize it on executors.
Context
A
HashTablecannot be restored correctly by serializing only the bucket contents. Its runtime behavior also depends on:RowContainerrow layout and stored row payloads,VectorHasherstate used by non-kHashprobe paths,kHash,kArray,kNormalizedKey).Changes
velox/common/serialization/NativeSerdeIO.h.HashTable::serializedSize(),HashTable::serializeTo(...), andHashTable::deserializeFrom(...).columnHasNulls_,VectorHasherstate,RowContainer::storeSerializedRow(std::string_view, char*)to restore rows directly from serialized row bytes.VectorHasherstate serialization/deserialization to preserve value-id/range state across processes.BigintValuesUsingBloomFilter.kHash/kNormalizedKey,kArrayto match restoredVectorHasherstate.VectorHasherstate round-trip cases.Testing
HashTableSerializationTestVectorHasherserialization round-trip tests