This Python script compares two directories for duplicate files and optionally generates a list of files to safely delete based on your chosen strategy (e.g., keep older or newer files). It uses file size and SHA-256 hash to efficiently and accurately detect duplicates.
- ✅ Compares two directories recursively
- 🔒 Uses file size + SHA-256 hash to detect exact duplicates
- 🧠 Smart strategy to choose which duplicate to delete (
--strategy older|newer) - 📁 Optional output file for deletion (e.g., with
rm -rf) - 🚫 Supports exclude patterns (e.g.,
*.jpg,*.png) using glob syntax - 🧪 Safe to test before actual deletion
- Python 3.6+
- Standard library only (no third-party dependencies)
python find_duplicates.py [dir1] [dir2] [options]| Argument | Description |
|---|---|
dir1 |
First directory to scan (typically the one you want to keep files from) |
dir2 |
Second directory to scan (typically the one to delete duplicates from) |
| Option | Description | Default |
|---|---|---|
-o, --output [file] |
Output file to save list of duplicates to delete | - |
--strategy [older newer] |
Which file to delete: the older or the newer one (default: newer) |
newer |
-e, --exclude "*.ext1,*.ext2" |
Comma-separated list of glob patterns to exclude (e.g. *.jpg,*.png) |
- |
python find_duplicates.py dirA dirB --strategy newer -o duplicates.txtpython find_duplicates.py dirA dirB -e "*.jpg,*.png" -o duplicates.txtFirst, check what would be deleted:
cat duplicates.txt | xargs -I {} echo Deleting: {}Then, delete them:
cat duplicates.txt | xargs rm -rf- Always preview the output file before running
rm -rf. - This script only detects exact file duplicates (by size + SHA-256 hash).
- It does not compare directories for similar file names or approximate content.
duplicate-file-finder/
│
├── main.py # Main script
└── README.md # This file
MIT License — free to use, modify, and share.