Skip to content

feat: Implement se/dser method for HashTable#2160

Open
JkSelf wants to merge 1 commit into
oss-mainfrom
serialize-hashtable
Open

feat: Implement se/dser method for HashTable#2160
JkSelf wants to merge 1 commit into
oss-mainfrom
serialize-hashtable

Conversation

@JkSelf

@JkSelf JkSelf commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Problem

HashTable currently does not support native serialization/deserialization.
To enable driver-side broadcast hash join in Gluten (similar to Spark), we need to serialize the pre-built hash table on the driver and deserialize it on executors.

Context

A HashTable cannot be restored correctly by serializing only the bucket contents. Its runtime behavior also depends on:

  • RowContainer row layout and stored row payloads,
  • VectorHasher state used by non-kHash probe paths,
  • optional bloom filter state,
  • hash-mode-specific rebuild behavior (kHash, kArray, kNormalizedKey).

Changes

  • Added native binary serde helpers in velox/common/serialization/NativeSerdeIO.h.
  • Added HashTable::serializedSize(), HashTable::serializeTo(...), and HashTable::deserializeFrom(...).
  • Serialized and restored:
    • hash table metadata and versioning,
    • key/dependent type information,
    • row data and stored hashes,
    • columnHasNulls_,
    • VectorHasher state,
    • normalized keys when present.
  • Added RowContainer::storeSerializedRow(std::string_view, char*) to restore rows directly from serialized row bytes.
  • Added VectorHasher state serialization/deserialization to preserve value-id/range state across processes.
  • Added binary serialization/deserialization for BigintValuesUsingBloomFilter.
  • Rebuilt the table during deserialization based on hash mode:
    • reused serialized hashes for kHash / kNormalizedKey,
    • recomputed value IDs for kArray to match restored VectorHasher state.
  • Added unit tests covering:
    • round-trip serialization/deserialization,
    • join probe after restore,
    • multiple data types, nulls, long strings, duplicates, and large inputs,
    • corrupted/invalid payload handling,
    • cross-process serialization scenarios,
    • VectorHasher state round-trip cases.

Testing

  • Added HashTableSerializationTest
  • Added VectorHasher serialization round-trip tests

@JkSelf JkSelf requested a review from majetideepak as a code owner June 19, 2026 00:14
@JkSelf

JkSelf commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

alchemy merge

@prestodb-ci

Copy link
Copy Markdown
Collaborator

alchemy link ce74ff8

@prestodb-ci

Copy link
Copy Markdown
Collaborator

Added new rebase item:

@prestodb-ci

Copy link
Copy Markdown
Collaborator

Failed to cherry-pick commit ce74ff8 in rebase request #2165:

exit status 1
error: could not apply ce74ff81d7... feat: Implement se/dser method for HashTable

Auto-merging velox/exec/CMakeLists.txt
Auto-merging velox/exec/tests/CMakeLists.txt
CONFLICT (content): Merge conflict in velox/exec/tests/CMakeLists.txt

Please:

  1. Rebase your branch with staging/staging-rebase and fix the conflict. If the rebase item is a PR, you can change the base branch to this staging branch.

  2. Comment on this issue with the updated rebase item.

    For a PR (copy and edit):

    alchemy merge @2026-06-19T11:32:13Z
    

    For an issue (copy and edit):

    alchemy link [updated comma-separated commit SHAs for this issue] @2026-06-19T11:32:13Z
    
  3. Re-open Rebase branch staging-rebase (a93c9e8) with staging-rebase-head (1d05891) #2165 to retry the cherry-pick.

@JkSelf

JkSelf commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

alchemy merge @2026-06-19T11:32:13Z

@prestodb-ci

Copy link
Copy Markdown
Collaborator

alchemy link ce74ff8 @2026-06-19T11:32:13Z

@prestodb-ci

Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-06-19T11:32:13Z by @prestodb-ci via #2160 (comment):

Added new rebase item:

@prestodb-ci

Copy link
Copy Markdown
Collaborator

Failed to cherry-pick commit ce74ff8 in rebase request #2168:

exit status 1
error: could not apply ce74ff81d7... feat: Implement se/dser method for HashTable

Auto-merging velox/exec/CMakeLists.txt
Auto-merging velox/exec/tests/CMakeLists.txt
CONFLICT (content): Merge conflict in velox/exec/tests/CMakeLists.txt

Please:

  1. Rebase your branch with staging/staging-rebase and fix the conflict. If the rebase item is a PR, you can change the base branch to this staging branch.

  2. Comment on this issue with the updated rebase item.

    For a PR (copy and edit):

    alchemy merge @2026-06-20T05:43:02Z
    

    For an issue (copy and edit):

    alchemy link [updated comma-separated commit SHAs for this issue] @2026-06-20T05:43:02Z
    
  3. Re-open Rebase branch staging-rebase (a93c9e8) with staging-rebase-head (7096ab4) #2168 to retry the cherry-pick.

@JkSelf

JkSelf commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

alchemy merge @2026-06-20T05:43:02Z

@prestodb-ci

Copy link
Copy Markdown
Collaborator

alchemy link 2ca1ac8 @2026-06-20T05:43:02Z

@prestodb-ci

Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-06-20T05:43:02Z by @prestodb-ci via #2160 (comment):

Added new rebase item:

@prestodb-ci

Copy link
Copy Markdown
Collaborator

Failed to cherry-pick commit 2ca1ac8 in rebase request #2174:

exit status 1
error: could not apply 2ca1ac80cb... feat: Implement se/dser method for HashTable

Auto-merging velox/exec/CMakeLists.txt
Auto-merging velox/exec/tests/CMakeLists.txt
CONFLICT (content): Merge conflict in velox/exec/tests/CMakeLists.txt

Please:

  1. Rebase your branch with staging/staging-rebase and fix the conflict. If the rebase item is a PR, you can change the base branch to this staging branch.

  2. Comment on this issue with the updated rebase item.

    For a PR (copy and edit):

    alchemy merge @2026-06-23T06:49:46Z
    

    For an issue (copy and edit):

    alchemy link [updated comma-separated commit SHAs for this issue] @2026-06-23T06:49:46Z
    
  3. Re-open Rebase branch staging-rebase (a93c9e8) with staging-rebase-head (5a397b2) #2174 to retry the cherry-pick.

@JkSelf JkSelf force-pushed the serialize-hashtable branch from 2ca1ac8 to 5624f6e Compare June 23, 2026 07:22
@JkSelf

JkSelf commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

alchemy merge @2026-06-23T06:49:46Z

@prestodb-ci

Copy link
Copy Markdown
Collaborator

alchemy link 5624f6e @2026-06-23T06:49:46Z

@prestodb-ci

Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-06-23T06:49:46Z by @prestodb-ci via #2160 (comment):

Added new rebase item:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants