Skip to content

AfshanKhan/aipt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Archive Integrity and Preservation Toolkit (AIPT)

Cross-platform digital archive integrity, preservation and deduplication toolkit.

AIPT is a production-quality, preservation-first CLI tool for safely auditing and preserving mixed digital archives on macOS, Linux, and Windows. It prioritizes safety, conservative validation, and minimization of false positives over aggressive deletion.

No files are ever automatically deleted. Instead, problematic files are quarantined with full restoration capabilities.


The Problem Solved

Digital archives (photos, videos, documents) often suffer from silent corruption, OS-level clutter, and duplication over time. Simple scripts or aggressive deduplication tools can cause data loss through false positives or hasty deletions.

AIPT solves this by providing a highly conservative, multi-tier validation system. It safely audits junk, verifies file integrity (using hardware-accelerated video decoding when available), accurately finds exact duplicates, and performs perceptual best-version resolution.

Features

  • Junk/Clutter Audit: Safely quarantines OS clutter (.DS_Store, Thumbs.db, __MACOSX, etc.).
  • Conservative Integrity Scanning:
    • Images: Zero-byte, truncation, and Pillow verification.
    • Videos & Audio: 3-tier validation (fast metadata probe → keyframe-only decode → full decode fallback). Benign warnings are classified separately from fatal corruption.
    • Documents: Deep structural validation for PDFs (pypdf) and Office/ZIP files (zipfile), plus zero-byte text checking.
  • Hardware Acceleration: Autodetects and utilizes videotoolbox, cuda, nvdec, qsv, dxva2, d3d11va, and vaapi for video processing.
  • Hybrid Exact Deduplication: 4-stage pipeline (Size → Sample BLAKE2b/MD5 → Full hash) to avoid disk thrashing on large archives.
  • Perceptual Best Version Resolution: dHash-based grouping to keep the highest quality version of an image and quarantine the rest.
  • Empty Folder Cleanup: Safely removes empty directories left behind.
  • Safe Quarantine & Restore: Every action is logged to quarantine_manifest.json enabling a robust aipt restore workflow.
  • Smart Runtime Detection: Automatically configures GPU backends and optimal worker counts to prevent disk thrashing.

Safety Philosophy

  • Conservative by default: Unknown conditions result in a warning, not a corruption flag.
  • Never delete automatically: Destructive actions are replaced with a robust Quarantine system.
  • Restorable: Every quarantined file can be restored to its original path via the manifest.
  • Timeout safety: Timeouts (especially on large video files) do not imply corruption.

🚀 Quick Start

1. Install Dependencies

Ensure you have FFmpeg installed (required for video scanning):

OS Command
macOS brew install ffmpeg
Linux sudo apt install ffmpeg
Windows winget install ffmpeg

2. Install AIPT

We recommend using uv to install AIPT as a standalone tool. This makes the aipt command available globally.

# Clone the repo
git clone https://github.com/AfshanKhan/aipt.git
cd aipt

# Install as a global tool
uv tool install . --python 3.13

🛠 Usage

Once installed, you can run aipt from any directory.

The "One-Command" Workflow

The easiest way to process an archive is the run-all command. It initializes the system and runs every preservation step in the correct order.

aipt run-all /path/to/your/archive

Manual Commands

If you prefer more control, you can run individual stages:

Command Description
aipt init Initialize system folders
aipt audit Quarantine junk & OS clutter
aipt integrity Scan for corrupt images/videos
aipt dupes Remove exact byte-for-byte duplicates
aipt best-version Keep highest quality perceptual images
aipt clean Remove empty directories
aipt restore Move files back from quarantine

Safety First (Dry Run)

Want to see what happens without moving any files? Just add --dry-run:

aipt run-all /path/to/archive --dry-run

Restore Workflow

If a file was incorrectly quarantined or you wish to revert an action:

aipt restore /path/to/archive

⚡️ GPU Acceleration

AIPT automatically detects your hardware and chooses the fastest available video decoder.

GPU Brand Technology Supported OS
Apple Silicon videotoolbox macOS
NVIDIA cuda, nvdec Windows, Linux
Intel (Arc/UHD) qsv (QuickSync), vaapi Windows, Linux
AMD (Radeon) amf, vaapi, d3d11va Windows, Linux

How to optimize

  1. Drivers: Ensure your GPU drivers are up to date.
  2. FFmpeg: Ensure your version of FFmpeg was built with support for these decoders (the default versions from brew, apt, and winget usually include them).
  3. Detection: AIPT will print HW accel: [name] at the start of an integrity scan so you can verify it's using your GPU.

Contributing

Contributions are welcome. Please ensure that PRs adhere to the conservative safety philosophy of the project.

License

Apache License 2.0 + Commons Clause. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages