arm-neon-eisenstein-bench

Four Eisenstein norm computations in five NEON instructions.

ARM NEON benchmarks for the eisenstein crate's integer arithmetic. Vectors of four int32x4_t compute a² − ab + b² in parallel using fused multiply-add and narrowing shifts. The measured throughput is 3.3× over scalar on a Cortex-A72. The theoretical maximum is 4× (limited by register pressure on the accumulator).

The NEON Sequence

// Four norm computations in five instructions:
smull v0.4s, v0.4s, v0.4s    // a²
smull v1.4s, v1.4s, v1.4s    // b²
mla   v0.4s, v1.4s, v2.4s    // a² + b²
mls   v0.4s, v3.4s, v4.4s    // a² - ab + b²

Five instructions compute four norms. No branch. No load-hit-store. No SVE or SME required — just the NEON unit every ARMv8 chip has.

Results

3.3× throughput on Cortex-A72 (measured, 5-run median)
4× theoretical maximum (limited by register pressure)
Zero drift — same integer arithmetic as the Rust crate
Zero unsafe — NEON intrinsics, no inline assembly in the Rust wrapper

Build

cargo build --release
./target/release/arm-neon-eisenstein-bench

Requires aarch64 or ARMv8 with NEON.

License

MIT OR Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
src		src
target		target
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
RESULTS.md		RESULTS.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arm-neon-eisenstein-bench

The NEON Sequence

Results

Build

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

arm-neon-eisenstein-bench

The NEON Sequence

Results

Build

License

About

Topics

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages