Problem
When writing a ZIP archive with multiple entries using ZipWriter, each call to start_file allocates a new DEFLATE compressor via GenericZipWriter::prepare_next_writer -> switch_to. The zlib-rs deflate state alone is ~380 KB per instance.
For archives with many small files this results in disproportionate total heap allocations:
┌────────────────────────────────┬─────────────────┬───────────┐
│ Workload │ Total Allocated │ Peak Live │
├────────────────────────────────┼─────────────────┼───────────┤
│ 500 x 1 KB files (DEFLATE) │ 207 MB │ 827 KB │
├────────────────────────────────┼─────────────────┼───────────┤
│ 500 x 1 KB files (TAR, no zip) │ 326 KB │ 152 KB │
└────────────────────────────────┴─────────────────┴───────────┘
91.7% of allocations come from zlib_rs::deflate::init via Compress::new, and 7.9% from the flate2::zio::Writer buffer - both inside switch_to.
Allocation call chain
ZipWriter::start_file
> GenericZipWriter::switch_to (write.rs:2053)
> prepare_next_writer (write.rs:1900)
> DeflateEncoder::new
> Compress::new (~380 KB per call)
> flate2::zio::Writer::new (~32 KB per call)
Profiling details
Measured with dhat 0.3 on zip 7.4.0, --release with debug symbols:
- 189,984,000 bytes (91.7%), 500 blocks - zlib_rs::deflate::init
- 16,384,000 bytes (7.9%), 500 blocks - flate2::zio::Writer::with_capacity
Suggested improvement
Reuse the DEFLATE compressor across entries by resetting its state (e.g. Compress::reset) instead of dropping and re-allocating. The flate2::Compress type supports reset() which reinitializes the stream without reallocating the internal buffers.
Environment
- zip 7.4.0
- macOS aarch64, Rust 1.89.0
Problem
When writing a ZIP archive with multiple entries using
ZipWriter, each call tostart_fileallocates a new DEFLATE compressor viaGenericZipWriter::prepare_next_writer->switch_to. Thezlib-rsdeflate state alone is ~380 KB per instance.For archives with many small files this results in disproportionate total heap allocations:
91.7% of allocations come from
zlib_rs::deflate::initviaCompress::new, and 7.9% from theflate2::zio::Writer buffer- both insideswitch_to.Allocation call chain
Profiling details
Measured with dhat 0.3 on zip 7.4.0, --release with debug symbols:
Suggested improvement
Reuse the DEFLATE compressor across entries by resetting its state (e.g.
Compress::reset) instead of dropping and re-allocating. Theflate2::Compresstype supportsreset()which reinitializes the stream without reallocating the internal buffers.Environment