From da935795210d074783611b195ce94a59b5a26a5d Mon Sep 17 00:00:00 2001 From: Michael Norris Date: Wed, 10 Jun 2026 09:26:21 -0700 Subject: [PATCH] Migrate faiss OpenMP-off consumers to canonical //faiss:faiss + use_omp_mock flag MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Summary: ## TLDR: repoint every other target besides MYSQL to just `faiss` target along with use_omp_mock flag ## Problem `fblearner/m4/service:m4_service` failed to link in `mode/opt` + `fbcode.use_link_groups=true` with duplicate FAISS symbols (`faiss::FaissException`, `faiss::WorkerThread`, `faiss::RandomGenerator`, ...). Root cause: `//faiss:faiss`, `//faiss:faiss_no_multithreading`, `//faiss:faiss_omp_mock`, and the `*_instrumented_*` variants all compile the *same* FAISS sources with different OpenMP flags, so each emits the full set of FAISS symbols. Any binary that links two of them gets an ODR / duplicate-symbol error. `m4_service` reached three at once (`:faiss` via mui, `:faiss_omp_mock` via Laser KNN, `:faiss_no_multithreading` via m4_package_manager + Laser FANN/LCC + Galileo). It only surfaces in merged static links (`mode/opt` + link groups); dev/shared builds resolve the duplicates at runtime, which is why it had been latent. ## The (eventual) design: one `:faiss` target + the `use_omp_mock` buckconfig The FAISS-team-sanctioned mechanism (comment at the top of `faiss/BUCK`, added in D103765931) is: a binary that pulls FAISS through multiple paths uses the single canonical `//faiss:faiss` everywhere, and gets OpenMP-disabled behavior — when needed — by building with `-c faiss.use_omp_mock=true`, NOT by depending on a separate target. Why the flag is better when you control the build (a service deployed through a single fbpkg): - **ODR-safe by construction.** There is only ever one `:faiss` target in the graph, so two FAISS compilations can never be co-linked. A separate `*_omp_mock` / `*_no_multithreading` target is a footgun for any *shared library*: the moment a consumer of that lib also pulls canonical `:faiss` (via velox/caffe2/mui/etc.), the binary links two FAISS copies and breaks. That is exactly the m4_service failure, and it is structurally impossible with the flag. - **Single control point.** OpenMP policy becomes a property of the *binary owner* (one `buck_opts.config` entry on the fbpkg), not behavior baked into a shared library that silently forces it on every downstream consumer. - **Zero-overhead, verified equivalent.** With the flag, `:faiss` compiles with `omp_mock.h` (every `omp_*` call becomes a compile-time constant), `-fno-openmp`, and sequential MKL — byte-equivalent OpenMP elimination to `:faiss_omp_mock`. Verified with `nm`: canonical `:faiss` has 28 `__kmpc_*`/`__kmp_*` runtime references; with `-c faiss.use_omp_mock=true` it has 0. - **Lets the redundant targets be deleted** (follow-up diff), shrinking FAISS's surface. ## Why we are removing the `*_no_multithreading` targets specifically `faiss_no_multithreading` is the *weaker, known-insufficient* form of OpenMP-off. It only passes `-fno-openmp` (disables `#pragma omp` regions) but still includes the real `omp.h`, still links `libomp`, and still executes FAISS's explicit `omp_*()` runtime calls — so OpenMP thread *registration* and the bootstrap lock still run. That registration is precisely what caused the production crashes that motivated the mock (per D87447901 / SEVs S583356, S596410, where MySQL crashed under high concurrency *even with* `faiss_no_multithreading`). `:faiss` + `use_omp_mock` is strictly more complete (it also no-ops the runtime calls and drops `libomp` entirely), so every `no_multithreading` consumer is better served by the flag. Keeping the target only invites new ODR breakage like this one. (`omp_mock` no-ops OpenMP *locks*; safe for the read-path / centroid-assignment / PQ-encode serving uses here. Consumers doing concurrent HNSW graph mutation that rely on FAISS's internal OMP locks should validate — none in this diff's scope do.) ## What this diff does Repoints every FAISS-variant consumer to canonical `//faiss:faiss` / `//faiss:faiss_instrumented` (BUCK deps + `manual` include annotations), and adds `faiss.use_omp_mock=true` to each affected *serving* fbpkg so OpenMP-off behavior is preserved. Consumers repointed: laser (knn, fann_storage, lcc), galileo (common, query, unifiedindex), lcc (encoder/compute/processor), soma/apps/faiss_knn, vector_search test service, assistant knowledge search, qi reducer_transform test, commerce retrieval, fblearner/m4 (resource + subgraph_runner/spark_transformer, incl. removing its now-dead `faiss-no-mt` link-group workaround), faiss perf_tests. Serving fbpkgs given `use_omp_mock=true` (preserving prior OpenMP-off): laser `db_leaf` (`laser/BUCK`), laser indexer (`laser/BUCK`), galileo leaf (`galileo/leafservice/BUCK`), galileo offline indexer (`galileo/offline/BUCK`), WH Eunomia Hermes update service (`eunomia/v2/hermes/BUCK`), SIP/lcc (`pubcontent/sip/service/BUCK`), soma import tools (`soma/tools/import/BUCK`), vectordb test svc. Eunomia Scribe update-service already has it (D107764083). M4 binaries (m4_service, spark_transformer) intentionally stay OpenMP-on `:faiss` — m4 is not the OpenMP-contention path. An independent completeness audit of the serving binaries that link the repointed libs surfaced three fbpkgs initially missed (WH Eunomia Hermes, galileo offline indexer, laser indexer) — now flagged above. The control-plane `laser.py.laser_admin` CLI also links these libs and was OpenMP-off before; it is left OpenMP-on (not a high-QPS server) — laser owners should confirm or add the flag if desired. Each flagged serving binary was verified to link exactly one FAISS compilation (no hidden velox/caffe2 OpenMP-on faiss path that the binary-global flag would silently mock). No FAISS targets are deleted here (stacked follow-up does that). `//faiss:faiss_omp_mock` is intentionally retained for MySQL/XDB — see the follow-up diff for why it cannot move to the flag. Note for reviewers — each owning team please confirm the `use_omp_mock=true` addition on your serving fbpkg matches your deployment: laser (`db_leaf`), galileo (leaf), lcc/SIP, soma (import tools), vector_search. Differential Revision: D108092579 --- perf_tests/bench_no_multithreading_rcq_search.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/perf_tests/bench_no_multithreading_rcq_search.cpp b/perf_tests/bench_no_multithreading_rcq_search.cpp index e684d6a60f..6f28aa202c 100644 --- a/perf_tests/bench_no_multithreading_rcq_search.cpp +++ b/perf_tests/bench_no_multithreading_rcq_search.cpp @@ -8,8 +8,8 @@ #include #include -#include // @manual=//faiss:faiss_no_multithreading -#include // @manual=//faiss:faiss_no_multithreading +#include // @manual=//faiss:faiss +#include // @manual=//faiss:faiss using namespace faiss; DEFINE_uint32(iterations, 20, "iterations");