Skip to content

ppc64le#291

Draft
aous72 wants to merge 10 commits into
masterfrom
ppc64le
Draft

ppc64le#291
aous72 wants to merge 10 commits into
masterfrom
ppc64le

Conversation

@aous72

@aous72 aous72 commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Adds support for powerpc64le.
The original contributor is @runlevel5 in #282.

runlevel5 and others added 4 commits June 17, 2026 13:29
Native 128-bit VSX implementations of the wavelet, colour, and
codestream kernels and the HTJ2K block decoder, with runtime
dispatch via hwcap. Supported targets are POWER9 (ISA 3.0) and
newer, little-endian only; other PPC targets use the generic code
paths.

Beyond a straight port, the kernels use POWER-specific forms where
measurement showed a win: xvrspi for round-to-nearest-away in the
float-to-int conversions, vec_sel for masked selects, and a block
decoder that destuffs the MagSgn bitstream upfront so per-quad bit
consumption is a GPR add instead of a vector-window shift.

The SIMD block decoder is dispatched everywhere on POWER10, and for
irreversible tile components on POWER9, where it beats the scalar
decoder; reversible content on POWER9 stays scalar, which is
slightly faster there.


Assisted-by: Lance Albertson <lance@osuosl.org>
Assisted-by: Thushan Fernando <thushan@thushanfernando.com>

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
@aous72 aous72 marked this pull request as draft June 17, 2026 05:48
@aous72

aous72 commented Jun 17, 2026

Copy link
Copy Markdown
Owner Author

@runlevel5.
I tested CI on this PR.
The code does not compile on Ubuntu 22.04 (__builtin_shufflevector was added in gcc 12).
The code compiles and passes the test on Ubuntu 24.04.
The code compiles but fails the test on Ubuntu_latest.
Could you please advise?

Kind regards,
Aous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants