Skip to content

look into observed weak ivf-pq indexing-time performance #143

@nvzm123

Description

@nvzm123

Try to reproduce the following performance on 26.02 and see if problems disappear on latest branches/PRs

Image
Vector Dimensions: 64

  Vector Similarity: DOT_PRODUCT  (vectors are assumed pre-normalized (per the code comment))

  GPU (cuVS CAGRA) Settings — from StandaloneIndexer.createIndexWriterConfig():

  - CagraGraphBuildAlgo: IVF_PQ (the enum value, no sub-parameters like nlist/nprobe/PQ bits are set — they use cuVS defaults)
  - graphDegree: 32 (default is 64)
  - intermediateGraphDegree: 48 (default is 128)
  - maxConn (HNSW M): 16
  - beamWidth: 100
  - numMergeWorkers: 16
  - Merge executor: a fixed thread pool of 16

  CPU (Lucene HNSW) Settings:

  - Lucene99HnswVectorsFormat(maxConn=16, beamWidth=100, mergeWorkers=16, mergePool)
  - Codec: Lucene101Codec with BEST_SPEED mode

  IndexWriter Config (shared):

  - RAMBufferSizeMB: 30 GB (30 * 1024)
  - maxBufferedDocs: disabled (auto-flush off)
  - useCompoundFile: false
  - openMode: CREATE

  Flush / Commit Behavior:

  - GPU mode: commits are skipped during loading (!useGpu guard on line ~180 of StandaloneIndexer). All docs accumulate in RAM, then close() triggers a single commit() → forceMerge(4) → commit().
  - CPU mode: commits every 200,000 docs (COMMIT_BATCH_SIZE = 200000)

  Indexing Threads:

  - GPU: 2 threads (code says useGpu ? 2 : MERGE_WORKERS), though the Quip doc notes 1 thread was optimal and 2 was slower
  - CPU: 16 threads

  Queue size: 150,000 documents

  IVF_PQ sub-parameters: The code uses IVF_PQ as the CagraGraphBuildAlgo enum but does not set any IVF_PQ-specific parameters (like nlist, nprobe, PQ sub-quantizers, PQ bits). These all use cuVS defaults. The AcceleratedHNSWParams.Builder only exposes
  withCagraGraphBuildAlgo, withGraphDegree, withIntermediateGraphDegree, withMaxConn, withBeamWidth, withNumMergeWorkers, and withMergeExecutorService.

My hypothesis is that most of these issues are resolved by using better IVF_PQ settings and switching to the latest main branch for cuvs and cuvs-lucene (with my PRs for cuvs-lucene: #141, #140), but there are also some timing issues that we need to update in the searchscale benchmarking repo (SearchScale/vectorsearch-benchmarks#14, SearchScale/vectorsearch-benchmarks#16). Will verify this (#143) or continue investigating after resolving SearchScale/vectorsearch-benchmarks#17

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions