Skip to content

Deflate encoder is re-allocated per entry in ZipWriter, causing excessive heap allocations #633

@bug-ops

Description

@bug-ops

Problem

When writing a ZIP archive with multiple entries using ZipWriter, each call to start_file allocates a new DEFLATE compressor via GenericZipWriter::prepare_next_writer -> switch_to. The zlib-rs deflate state alone is ~380 KB per instance.

For archives with many small files this results in disproportionate total heap allocations:

  ┌────────────────────────────────┬─────────────────┬───────────┐
  │            Workload            │ Total Allocated │ Peak Live │
  ├────────────────────────────────┼─────────────────┼───────────┤
  │ 500 x 1 KB files (DEFLATE)     │          207 MB │    827 KB │
  ├────────────────────────────────┼─────────────────┼───────────┤
  │ 500 x 1 KB files (TAR, no zip) │          326 KB │    152 KB │
  └────────────────────────────────┴─────────────────┴───────────┘

91.7% of allocations come from zlib_rs::deflate::init via Compress::new, and 7.9% from the flate2::zio::Writer buffer - both inside switch_to.

Allocation call chain

  ZipWriter::start_file
    > GenericZipWriter::switch_to        (write.rs:2053)
      > prepare_next_writer              (write.rs:1900)
        > DeflateEncoder::new
          > Compress::new                (~380 KB per call)
          > flate2::zio::Writer::new     (~32 KB per call)

Profiling details

Measured with dhat 0.3 on zip 7.4.0, --release with debug symbols:

  1. 189,984,000 bytes (91.7%), 500 blocks - zlib_rs::deflate::init
  2. 16,384,000 bytes (7.9%), 500 blocks - flate2::zio::Writer::with_capacity

Suggested improvement

Reuse the DEFLATE compressor across entries by resetting its state (e.g. Compress::reset) instead of dropping and re-allocating. The flate2::Compress type supports reset() which reinitializes the stream without reallocating the internal buffers.

Environment

  • zip 7.4.0
  • macOS aarch64, Rust 1.89.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions