Skip to content

Bump sentencepiece submodule to fix GCC 15 build#193

Merged
rascani merged 1 commit into
mainfrom
bump-sentencepiece-cstdint
Jun 8, 2026
Merged

Bump sentencepiece submodule to fix GCC 15 build#193
rascani merged 1 commit into
mainfrom
bump-sentencepiece-cstdint

Conversation

@rascani

@rascani rascani commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Bumps third-party/sentencepiece from d8f7418 (Aug 2024) to bcc6390 (Jul 2025)
  • The key fix is google/sentencepiece#1109: a missing #include <cstdint> in sentencepiece_processor.h that causes compilation failures under GCC 15

Motivation

Ubuntu 26.04 ships GCC 15, which enforces stricter C++ standards and no longer implicitly includes <cstdint> via transitive headers. This breaks the pytorch_tokenizers build when sentencepiece is compiled from source inside the GCC 15 docker image.

This unblocks pytorch/executorch#19917 (RISC-V baremetal CI), which needs the executorch-ubuntu-26.04-gcc15 image for the riscv64-unknown-elf cross-compiler + picolibc packages.

Changes between d8f7418..bcc6390 (15 commits)

All low-risk: README updates, build tooling (cibuildwheel bump), a unigram training crash fix, Python 3.13 support, AIX porting, and the cstdint fix.

🤖 Generated with Claude Code

Bumps the sentencepiece submodule from d8f7418 (Aug 2024) to bcc6390
(Jul 2025, google/sentencepiece#1109). The key change is a missing
`#include <cstdint>` in sentencepiece_processor.h that causes build
failures under GCC 15 (Ubuntu 26.04), which no longer implicitly
includes cstdint through transitive headers.

This unblocks pytorch/executorch#19917 (RISC-V baremetal CI) which
uses the gcc15 docker image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 8, 2026
@rascani rascani requested a review from kirklandsign June 8, 2026 21:24
@rascani rascani merged commit 3f98e99 into main Jun 8, 2026
9 checks passed
@rascani rascani deleted the bump-sentencepiece-cstdint branch June 8, 2026 22:18
rascani added a commit to rascani/executorch that referenced this pull request Jun 9, 2026
The tokenizers submodule bump (meta-pytorch/tokenizers#193) changed
CMAKE_CXX_STANDARD from 17 to 20. Under C++20 the u8"▁" literal is
const char8_t[], which has no implicit conversion to const char* and
breaks std::string::rfind.

Spell the SentencePiece word-boundary marker as raw UTF-8 bytes,
matching the fix already on the 1.3 release branch (pytorch#19824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rascani added a commit to pytorch/executorch that referenced this pull request Jun 9, 2026
### Summary
Updates extension/llm/tokenizers to include
meta-pytorch/tokenizers#193, which bumps the sentencepiece submodule to
pick up a missing `#include <cstdint>` (google/sentencepiece#1109).

Without this, `pytorch_tokenizers` fails to compile inside the
`executorch-ubuntu-26.04-gcc15` docker image, blocking the RISC-V
baremetal CI (#19917).

### Test plan
CI

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants