Vector DB + LLM chaining using langchain with open source models for an information retrieval system on domain specific data. It enhances the experience of using a search engine to get direct concise answers besides pointing to the source document referred to generate the answer. I'm using this repository to document my experiments with generative llm as new methods/ tricks are released in the open source.
- Packages required are installed at the beginning of the notebook
- Standard_NC64as_T4_v3 Azure VM node type was used to run the notebook
- Collect all documents of your corpus into a single folder in pdf format
- Index is created by reading each document page-by-page and ahead of this each page will be referred as a document
- Embeddings for the vector index are generated by a text embedding model, various sentence-transformers models are available to choose from here
- FAISS is used to create an index of all these vectors and can be designed as complex as necessary to trade between faster retrieval speed and accuracy of retrieval
- From this leaderboard make a choice of the model
- Each model comes with its own complexities of hardware needed to load it and the packages that were used for training it
- Most leading models on huggingface provide guidance on both of these and its best to follow them before trying customizations
- Tweaking around the generation parameters like temperature, top_p, top_k, etc. helps in controlling quality of the generation
- The question is first run against the vector index to get top hits of documents
- Number of topk hits that can be used is limited by context length supported by the Generative LLM and chunking used to decide the length of each document
- A prompt template helps in explaining the task to the model with some examples given showing to set expectations for the generated tokens
- It is also prompted to return the document identifier as a source reference and the template gives explicit instructions on how this should be formatted
- A limited set of topk hits are sent in the prompt template to the model with the question to generate the answer
- Since it was prompted the follow a format in the answer in order to cite the reference, checking whether the format was used or not can help in discarding one of the cases where the Generative LLM definitely halucinated