[Performance] Prefetch for on-disk inverted list #5142
Unanswered
pseudo-xqr
asked this question in
Q&A
Replies: 2 comments
-
|
cc @pankajsingh88 ? |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@pseudo-xqr how many vectors at what dimension do you have? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Enabling prefetch with the default number of prefetch threads (32) for onDiskInvertedLists seems to degrade overall performance.
Platform
OS: Linux
Faiss version: v1.13.2-7-gc9bab48d1
Running on: CPU with 64 cores
Interface: C++ (Python should work as well)
Dataset used: hotpotqa
Reproduction instructions
I prebuilt the vector database on hotpotqa dataset with all-MiniLM-L6-v2 model, and save the trained vector database as the onDiskInvertedLists. The core code of running search is attached below:
In
index_ivf->searchfunction, if I comment out the function callprefetch_listsin thesub_search_func, the overall search latency is lower than without commenting out the function call.Another thing that I have tried is to reduce the number of prefetch threads, but the performance is still worse than without prefetching. The experiments that
search_timecaptured byivf_statsIs this performance trend as expected to see performance degradation with the default 32 prefetch threads? Under what configuration does using 32 prefetch threads be performance-wise beneficial?
Beta Was this translation helpful? Give feedback.
All reactions