Skip to content

Latest commit

 

History

History
68 lines (47 loc) · 3.65 KB

File metadata and controls

68 lines (47 loc) · 3.65 KB

S3 Large File Copy Tool

Release License

A high-performance Rust CLI for moving large files (>5GB) between Amazon S3 buckets. Built for speed, cost-efficiency, and reliability.

Why this tool?

I wrote this application due to a lack of tools and simplicity for moving large files between S3 buckets. Currently, AWS's simple copy object does not support objects larger than 5 GB in a single operation (see official documentation). Another solution available from AWS Support is to use a Python Lambda with S3 batch operations. Unfortunately, it takes hours and often fails after the 15-minute timeout for large files. My goal was simple: I wanted a clean and simple command-line interface (CLI) that could do the job (VERY) fast, clean, and with a lot of tuning, options, and customization. It is satisfying to see that the transfer of a single 100 GB file takes less than 30 seconds with a simple command line, from an EC2 instance in my VPC.

Key Features

  • 🚀 Performance: High-concurrency multipart engine with adaptive tuning.
  • 💰 Cost Aware: Live S3 pricing via the s3-pricing crate, with fallback regional estimates when the Pricing API is unavailable.
  • 🛠️ Flexible: Support for all storage classes, KMS encryption, and property preservation.
  • ✅ Reliable: Automatic cleanup on failure and checksum-based integrity verification.
  • 🚄 Auto-Mode: Intelligent optimization of part sizes and thread counts.
  • 📁 Recursive Sync: Copy entire prefixes with include/exclude glob filters.

Quick Start

# Basic copy
./s3_largecopy -s source-bucket -k data.iso -b dest-bucket -t data.iso

# Recursive prefix sync with filters
./s3_largecopy --source-bucket source-bucket \
  --source-prefix dataset/raw/ \
  --dest-bucket dest-bucket \
  --dest-prefix backups/raw/ \
  --include "*.parquet" --exclude ".git/*"

# Live pricing lookup
./s3_largecopy --get-price --region us-east-1 --storage-class STANDARD

# Cost estimate before copy
./s3_largecopy -s source-bucket -k data.iso -b dest-bucket -t data.iso --estimate

--get-price depends on live AWS Pricing API access. --estimate uses the same live pricing path when available and falls back to bundled regional pricing data if the lookup cannot be completed.

Documentation

Detailed documentation is available in the docs/ directory:

  • Installation - Prerequisites and build instructions (incl. static binaries).
  • Usage & Examples - Basic commands, advanced features, and CLI reference.
  • Architecture - How the engine works, diagrams, and internal modules.
  • Auto Mode - Deep dive into adaptive tuning and profiles.
  • Cost Analysis - Comprehensive guide to S3 transfer costs, live pricing, and estimation logic.
  • Permissions - IAM policy requirements and security configuration.
  • Troubleshooting - Performance tips and common error resolutions.
  • Changelog - Detailed history of changes and releases.

Author

Created and maintained by Bart Leboeuf.

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! See the Future Features for roadmap ideas.