Bump sentencepiece submodule to fix GCC 15 build#193
Merged
Conversation
Bumps the sentencepiece submodule from d8f7418 (Aug 2024) to bcc6390 (Jul 2025, google/sentencepiece#1109). The key change is a missing `#include <cstdint>` in sentencepiece_processor.h that causes build failures under GCC 15 (Ubuntu 26.04), which no longer implicitly includes cstdint through transitive headers. This unblocks pytorch/executorch#19917 (RISC-V baremetal CI) which uses the gcc15 docker image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kirklandsign
approved these changes
Jun 8, 2026
rascani
added a commit
to rascani/executorch
that referenced
this pull request
Jun 9, 2026
The tokenizers submodule bump (meta-pytorch/tokenizers#193) changed CMAKE_CXX_STANDARD from 17 to 20. Under C++20 the u8"▁" literal is const char8_t[], which has no implicit conversion to const char* and breaks std::string::rfind. Spell the SentencePiece word-boundary marker as raw UTF-8 bytes, matching the fix already on the 1.3 release branch (pytorch#19824). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rascani
added a commit
to pytorch/executorch
that referenced
this pull request
Jun 9, 2026
### Summary Updates extension/llm/tokenizers to include meta-pytorch/tokenizers#193, which bumps the sentencepiece submodule to pick up a missing `#include <cstdint>` (google/sentencepiece#1109). Without this, `pytorch_tokenizers` fails to compile inside the `executorch-ubuntu-26.04-gcc15` docker image, blocking the RISC-V baremetal CI (#19917). ### Test plan CI --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
third-party/sentencepiecefrom d8f7418 (Aug 2024) to bcc6390 (Jul 2025)#include <cstdint>insentencepiece_processor.hthat causes compilation failures under GCC 15Motivation
Ubuntu 26.04 ships GCC 15, which enforces stricter C++ standards and no longer implicitly includes
<cstdint>via transitive headers. This breaks thepytorch_tokenizersbuild whensentencepieceis compiled from source inside the GCC 15 docker image.This unblocks pytorch/executorch#19917 (RISC-V baremetal CI), which needs the
executorch-ubuntu-26.04-gcc15image for theriscv64-unknown-elfcross-compiler + picolibc packages.Changes between d8f7418..bcc6390 (15 commits)
All low-risk: README updates, build tooling (cibuildwheel bump), a unigram training crash fix, Python 3.13 support, AIX porting, and the cstdint fix.
🤖 Generated with Claude Code