Skip to content

build: -fno-math-errno -- drop unused errno side-effect on libm calls (~4% faster, output identical)#10

Open
carstenerickson wants to merge 1 commit into
danjlawson:mainfrom
carstenerickson:perf/cp-fno-math-errno
Open

build: -fno-math-errno -- drop unused errno side-effect on libm calls (~4% faster, output identical)#10
carstenerickson wants to merge 1 commit into
danjlawson:mainfrom
carstenerickson:perf/cp-fno-math-errno

Conversation

@carstenerickson

Copy link
Copy Markdown
Contributor

What

Append -fno-math-errno to the project OPTIMIZATION flags -- one line in Makefile.am.

Why

ChromoPainter's forward/backward inner loops call exp()/log() once per locus per donor -- the dominant transcendental cost in a paint. By default each scalar libm call must also set the global errno on a domain/range error. The code never reads errno after a math call, so that side effect is pure overhead on the hot path. -fno-math-errno drops it.

Why it is safe (no behavior change)

  • errno is never read after a math call. Audited the whole tree: zero errno references in any ChromoPainter or finestructure source. The only errno users are the bundled zlib (cp/zutil.c, cp/gzguts.h), which read it for file-I/O errors -- -fno-math-errno affects only libm math functions, not I/O, so zlib is untouched.
  • Output is byte-identical. The flag changes the errno side effect only, not any computed value. chunkcounts and chunklengths are md5-identical to the unmodified build on both a full chromosome and a subset (table below).
  • This is not -ffast-math. No -ffinite-math-only (NaN/Inf likelihood guards stay live), no reassociation, no -march. -funsafe-math-optimizations is already in OPTIMIZATION; this removes only the remaining errno barrier.

Measurement

AMD EPYC, gcc 11.4, 4 threads, single recipient vs the EUR donor panel, -s 0. Same base built with and without the flag.

workload base + -fno-math-errno delta output
full chr11 (~3M SNPs, -i 1) 294 s 282 s -3.9% chunkcounts/chunklengths md5-identical
100k-SNP subset (-i 10, best-of-3) 71.9 s 68.8 s -4.3% md5-identical

Peak RSS unchanged.

Scope

One line in Makefile.am (-fno-math-errno appended to OPTIMIZATION). No source changes, no -march, no -ffast-math.

The ChromoPainter forward/backward inner loops call exp()/log() once per
locus per donor -- the dominant transcendental cost in a paint. By default
each scalar libm call also sets the global errno on a domain or range
error. Nothing in the code reads errno after a math call (only the bundled
zlib's I/O paths touch errno, which this flag does not affect), so that
side effect is dead weight on the hot path.

-fno-math-errno drops it. Measured ~4% faster on a single-recipient paint
(chr11, ~3M SNPs, -s 0, 4 threads: 294 -> 282 s; 100k-SNP subset, best-of-3:
71.9 -> 68.8 s), with chunkcounts/chunklengths md5-identical to the
unmodified build -- the flag changes only the errno write, not any result.
This is not -ffast-math: finiteness guards and FP association are unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant