Mini Search Engine

A simple desktop search engine built with Python, Tkinter, and Whoosh. This application indexes local files and allows users to search through their contents using full-text search.

Full Source Code

import os, sys, json, csv, datetime
import tkinter as tk
from tkinter import ttk, filedialog, messagebox

from whoosh import index, qparser
from whoosh.fields import Schema, TEXT, ID, DATETIME
from whoosh.analysis import StemmingAnalyzer
from whoosh.highlight import Highlighter, HtmlFormatter, ContextFragmenter
from whoosh.qparser import QueryParser, MultifieldParser
from whoosh.query import DateRange, Every, Term
import whoosh.index as windex

try:
    from pypdf import PdfReader
    HAS_PDF = True
except ImportError:
    HAS_PDF = False

try:
    import openpyxl
    HAS_XLSX = True
except ImportError:
    HAS_XLSX = False

INDEX_DIR = os.path.join(os.path.expanduser("~"), ".mini_search_index")
RESULTS_PER_PAGE = 5

SCHEMA = Schema(
    path     = ID(stored=True, unique=True),
    filename = TEXT(stored=True),
    filetype = ID(stored=True),
    content  = TEXT(stored=True, analyzer=StemmingAnalyzer()),
    modified = DATETIME(stored=True),
)

Features

Index local folders
Full-text search using Whoosh
Search inside:
- TXT files
- PDF files
- JSON files
- CSV files
- XLSX files
Keyword highlighting
File type filtering
Date filtering
Pagination for results
Index statistics window
Simple Tkinter GUI

Requirements

Install required packages:

pip install whoosh pypdf openpyxl

How to Run

python app.py

Replace app.py with your actual filename.

How It Works

Choose a folder containing files
Select file formats to index
Click Build Index
Enter a search query
Browse paginated results

The search index is stored locally at:

~/.mini_search_index

Supported File Types

Extension	Description
.txt	Text files
.pdf	PDF documents
.json	JSON files
.csv	CSV spreadsheets
.xlsx	Excel workbooks

Search Features

The engine supports:

Fuzzy search
Wildcards
Filename search
Content search
File type filtering
Date range filtering

Example Searches

machine learning
report*
python~1
invoice

Project Structure

project/
│
├── app.py
├── README.md

Notes

Hidden folders are skipped during indexing
Invalid or unreadable files are ignored safely
PDF and XLSX support are optional depending on installed packages

Technologies Used

Python
Tkinter
Whoosh
PyPDF
OpenPyXL

License

This project is open-source and free to use.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
documents		documents
README.md		README.md
search Engine.py		search Engine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini Search Engine

Full Source Code

Features

Requirements

How to Run

How It Works

Supported File Types

Search Features

Example Searches

Project Structure

Notes

Technologies Used

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mini Search Engine

Full Source Code

Features

Requirements

How to Run

How It Works

Supported File Types

Search Features

Example Searches

Project Structure

Notes

Technologies Used

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages