🔍 Duplicate File Finder

This Python script compares two directories for duplicate files and optionally generates a list of files to safely delete based on your chosen strategy (e.g., keep older or newer files). It uses file size and SHA-256 hash to efficiently and accurately detect duplicates.

🚀 Features

✅ Compares two directories recursively
🔒 Uses file size + SHA-256 hash to detect exact duplicates
🧠 Smart strategy to choose which duplicate to delete (--strategy older|newer)
📁 Optional output file for deletion (e.g., with rm -rf)
🚫 Supports exclude patterns (e.g., *.jpg,*.png) using glob syntax
🧪 Safe to test before actual deletion

📦 Requirements

Python 3.6+
Standard library only (no third-party dependencies)

🧑‍💻 Usage

python find_duplicates.py [dir1] [dir2] [options]

🔧 Arguments

Argument	Description
`dir1`	First directory to scan (typically the one you want to keep files from)
`dir2`	Second directory to scan (typically the one to delete duplicates from)

🏁 Options

Option	Description	Default
`-o, --output [file]`	Output file to save list of duplicates to delete	-
`--strategy [older newer]`	Which file to delete: the `older` or the `newer` one (default: `newer`)	newer
`-e, --exclude ".ext1,.ext2"`	Comma-separated list of glob patterns to exclude (e.g. `.jpg,.png`)	-

🧪 Examples

Find duplicates and delete the newer file:

python find_duplicates.py dirA dirB --strategy newer -o duplicates.txt

Find duplicates but exclude images:

python find_duplicates.py dirA dirB -e "*.jpg,*.png" -o duplicates.txt

Delete duplicates after review:

First, check what would be deleted:

cat duplicates.txt | xargs -I {} echo Deleting: {}

Then, delete them:

cat duplicates.txt | xargs rm -rf

⚠️ Safety Notes

Always preview the output file before running rm -rf.
This script only detects exact file duplicates (by size + SHA-256 hash).
It does not compare directories for similar file names or approximate content.

📂 Project Structure

duplicate-file-finder/
│
├── main.py    # Main script
└── README.md             # This file

📝 License

MIT License — free to use, modify, and share.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Duplicate File Finder

🚀 Features

📦 Requirements

🧑‍💻 Usage

🔧 Arguments

🏁 Options

🧪 Examples

Find duplicates and delete the newer file:

Find duplicates but exclude images:

Delete duplicates after review:

⚠️ Safety Notes

📂 Project Structure

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 Duplicate File Finder

🚀 Features

📦 Requirements

🧑‍💻 Usage

🔧 Arguments

🏁 Options

🧪 Examples

Find duplicates and delete the newer file:

Find duplicates but exclude images:

Delete duplicates after review:

⚠️ Safety Notes

📂 Project Structure

📝 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages