Skip to content

Migrate faiss OpenMP-off consumers to canonical //faiss:faiss + use_omp_mock flag#5295

Open
mnorris11 wants to merge 1 commit into
facebookresearch:mainfrom
mnorris11:export-D108092579
Open

Migrate faiss OpenMP-off consumers to canonical //faiss:faiss + use_omp_mock flag#5295
mnorris11 wants to merge 1 commit into
facebookresearch:mainfrom
mnorris11:export-D108092579

Conversation

@mnorris11

@mnorris11 mnorris11 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

No description provided.

…mp_mock flag

Summary:
## TLDR: repoint every other target besides MYSQL to just `faiss` target along with use_omp_mock flag

## Problem

`fblearner/m4/service:m4_service` failed to link in `mode/opt` + `fbcode.use_link_groups=true` with duplicate FAISS symbols (`faiss::FaissException`, `faiss::WorkerThread`, `faiss::RandomGenerator`, ...). Root cause: `//faiss:faiss`, `//faiss:faiss_no_multithreading`, `//faiss:faiss_omp_mock`, and the `*_instrumented_*` variants all compile the *same* FAISS sources with different OpenMP flags, so each emits the full set of FAISS symbols. Any binary that links two of them gets an ODR / duplicate-symbol error. `m4_service` reached three at once (`:faiss` via mui, `:faiss_omp_mock` via Laser KNN, `:faiss_no_multithreading` via m4_package_manager + Laser FANN/LCC + Galileo). It only surfaces in merged static links (`mode/opt` + link groups); dev/shared builds resolve the duplicates at runtime, which is why it had been latent.

## The (eventual) design: one `:faiss` target + the `use_omp_mock` buckconfig

The FAISS-team-sanctioned mechanism (comment at the top of `faiss/BUCK`, added in D103765931) is: a binary that pulls FAISS through multiple paths uses the single canonical `//faiss:faiss` everywhere, and gets OpenMP-disabled behavior — when needed — by building with `-c faiss.use_omp_mock=true`, NOT by depending on a separate target.

Why the flag is better when you control the build (a service deployed through a single fbpkg):
- **ODR-safe by construction.** There is only ever one `:faiss` target in the graph, so two FAISS compilations can never be co-linked. A separate `*_omp_mock` / `*_no_multithreading` target is a footgun for any *shared library*: the moment a consumer of that lib also pulls canonical `:faiss` (via velox/caffe2/mui/etc.), the binary links two FAISS copies and breaks. That is exactly the m4_service failure, and it is structurally impossible with the flag.
- **Single control point.** OpenMP policy becomes a property of the *binary owner* (one `buck_opts.config` entry on the fbpkg), not behavior baked into a shared library that silently forces it on every downstream consumer.
- **Zero-overhead, verified equivalent.** With the flag, `:faiss` compiles with `omp_mock.h` (every `omp_*` call becomes a compile-time constant), `-fno-openmp`, and sequential MKL — byte-equivalent OpenMP elimination to `:faiss_omp_mock`. Verified with `nm`: canonical `:faiss` has 28 `__kmpc_*`/`__kmp_*` runtime references; with `-c faiss.use_omp_mock=true` it has 0.
- **Lets the redundant targets be deleted** (follow-up diff), shrinking FAISS's surface.

## Why we are removing the `*_no_multithreading` targets specifically

`faiss_no_multithreading` is the *weaker, known-insufficient* form of OpenMP-off. It only passes `-fno-openmp` (disables `#pragma omp` regions) but still includes the real `omp.h`, still links `libomp`, and still executes FAISS's explicit `omp_*()` runtime calls — so OpenMP thread *registration* and the bootstrap lock still run. That registration is precisely what caused the production crashes that motivated the mock (per D87447901 / SEVs S583356, S596410, where MySQL crashed under high concurrency *even with* `faiss_no_multithreading`). `:faiss` + `use_omp_mock` is strictly more complete (it also no-ops the runtime calls and drops `libomp` entirely), so every `no_multithreading` consumer is better served by the flag. Keeping the target only invites new ODR breakage like this one.

(`omp_mock` no-ops OpenMP *locks*; safe for the read-path / centroid-assignment / PQ-encode serving uses here. Consumers doing concurrent HNSW graph mutation that rely on FAISS's internal OMP locks should validate — none in this diff's scope do.)

## What this diff does

Repoints every FAISS-variant consumer to canonical `//faiss:faiss` / `//faiss:faiss_instrumented` (BUCK deps + `manual` include annotations), and adds `faiss.use_omp_mock=true` to each affected *serving* fbpkg so OpenMP-off behavior is preserved.

Consumers repointed: laser (knn, fann_storage, lcc), galileo (common, query, unifiedindex), lcc (encoder/compute/processor), soma/apps/faiss_knn, vector_search test service, assistant knowledge search, qi reducer_transform test, commerce retrieval, fblearner/m4 (resource + subgraph_runner/spark_transformer, incl. removing its now-dead `faiss-no-mt` link-group workaround), faiss perf_tests.

Serving fbpkgs given `use_omp_mock=true` (preserving prior OpenMP-off): laser `db_leaf` (`laser/BUCK`), laser indexer (`laser/BUCK`), galileo leaf (`galileo/leafservice/BUCK`), galileo offline indexer (`galileo/offline/BUCK`), WH Eunomia Hermes update service (`eunomia/v2/hermes/BUCK`), SIP/lcc (`pubcontent/sip/service/BUCK`), soma import tools (`soma/tools/import/BUCK`), vectordb test svc. Eunomia Scribe update-service already has it (D107764083). M4 binaries (m4_service, spark_transformer) intentionally stay OpenMP-on `:faiss` — m4 is not the OpenMP-contention path.

An independent completeness audit of the serving binaries that link the repointed libs surfaced three fbpkgs initially missed (WH Eunomia Hermes, galileo offline indexer, laser indexer) — now flagged above. The control-plane `laser.py.laser_admin` CLI also links these libs and was OpenMP-off before; it is left OpenMP-on (not a high-QPS server) — laser owners should confirm or add the flag if desired. Each flagged serving binary was verified to link exactly one FAISS compilation (no hidden velox/caffe2 OpenMP-on faiss path that the binary-global flag would silently mock).

No FAISS targets are deleted here (stacked follow-up does that). `//faiss:faiss_omp_mock` is intentionally retained for MySQL/XDB — see the follow-up diff for why it cannot move to the flag.

Note for reviewers — each owning team please confirm the `use_omp_mock=true` addition on your serving fbpkg matches your deployment: laser (`db_leaf`), galileo (leaf), lcc/SIP, soma (import tools), vector_search.

Differential Revision: D108092579
@meta-codesync

meta-codesync Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@mnorris11 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108092579.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant