Skip to content

aryakvn/duplicate-file-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

🔍 Duplicate File Finder

This Python script compares two directories for duplicate files and optionally generates a list of files to safely delete based on your chosen strategy (e.g., keep older or newer files). It uses file size and SHA-256 hash to efficiently and accurately detect duplicates.


🚀 Features

  • ✅ Compares two directories recursively
  • 🔒 Uses file size + SHA-256 hash to detect exact duplicates
  • 🧠 Smart strategy to choose which duplicate to delete (--strategy older|newer)
  • 📁 Optional output file for deletion (e.g., with rm -rf)
  • 🚫 Supports exclude patterns (e.g., *.jpg,*.png) using glob syntax
  • 🧪 Safe to test before actual deletion

📦 Requirements

  • Python 3.6+
  • Standard library only (no third-party dependencies)

🧑‍💻 Usage

python find_duplicates.py [dir1] [dir2] [options]

🔧 Arguments

Argument Description
dir1 First directory to scan (typically the one you want to keep files from)
dir2 Second directory to scan (typically the one to delete duplicates from)

🏁 Options

Option Description Default
-o, --output [file] Output file to save list of duplicates to delete -
--strategy [older newer] Which file to delete: the older or the newer one (default: newer) newer
-e, --exclude "*.ext1,*.ext2" Comma-separated list of glob patterns to exclude (e.g. *.jpg,*.png) -

🧪 Examples

Find duplicates and delete the newer file:

python find_duplicates.py dirA dirB --strategy newer -o duplicates.txt

Find duplicates but exclude images:

python find_duplicates.py dirA dirB -e "*.jpg,*.png" -o duplicates.txt

Delete duplicates after review:

First, check what would be deleted:

cat duplicates.txt | xargs -I {} echo Deleting: {}

Then, delete them:

cat duplicates.txt | xargs rm -rf

⚠️ Safety Notes

  • Always preview the output file before running rm -rf.
  • This script only detects exact file duplicates (by size + SHA-256 hash).
  • It does not compare directories for similar file names or approximate content.

📂 Project Structure

duplicate-file-finder/
│
├── main.py    # Main script
└── README.md             # This file

📝 License

MIT License — free to use, modify, and share.

About

A Python script to find and optionally delete duplicate files between two directories using file size, SHA-256 hashing, and customizable strategies (keep older or newer). Supports exclusion patterns and safe deletion workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages