This project involves developing a search engine for the Multimedia Information Retrieval and Computer Vision course, part of the Master's program in Artificial Intelligence and Data Engineering at the University of Pisa for the academic year 2023/2024. Each folder in this repository includes a README.md file detailing the classes it contains.
In the project directory, you will find two JAR files:
-
BuildIndex.jar: Used for creating the search index.
- To Run:
java -jar .\BuildIndex.jar encodingType compressDocID compressFreq dimBlock scoringFunction stopWordRemoval stemming - Parameters:
encodingType: str ("bin" or "text"), specifies the type of encoding to be used.compressDocID: str ("none" or "variablebyte"), how document IDs will be compressed.compressFreq: str ("none" or "variablebyte" or "unarycode"), how frequencies will be compressed.dimBlock: int, specifies the dimension of the block.scoringFunction: str ("TFIDF" or "BM25" or "BM11" or "BM15"), specifies the scoring function to be used.stopWordRemoval: boolean, if true, stopwords will be removed.stemming: boolean, if true, stemming will be performed.
- To Run:
-
Query.jar: Used for executing queries on the search engine.
- To Run:
java -jar .\Query.jar numResults scoringFunction queryMode compressDocID compressFreq docProcessor dimBlock stopWordRemoval stemming - Parameters:
numResults: int, specifies the number of documents retrieved by a query.scoringFunction: str ("TFIDF" or "BM25" or "BM11" or "BM15"), specifies the scoring function to be used.queryMode: str ("conjunctive" or "disjunctive"), specifies how the query will be performed.compressDocID: str ("none" or "variablebyte" or "unarycode"), how document IDs will be compressed.compressFreq: str ("none" or "variablebyte" or "unarycode"), how frequencies will be compressed.docProcessor: str ("DAAT" or "MaxScore"), specifies the documents will be processed.dimBlock: int, specifies the dimension of the block.stopWordRemoval: boolean, if true, stopwords will be removed.stemming: boolean, if true, stemming will be performed.
- To Run:
In the repository, there is also the documentation (Documentazione.pdf) that outlines the structure of our project, lists the design choices made, and shows the results obtained from our search engine. Additionally, each Java class has been commented to provide a detailed explanation of the code.