Skip to content

wolfdisk/s3: real SigV4 auth + multipart, copy, batch delete, range, metadata#4

Merged
wolfsoftwaresystemsltd merged 1 commit into
mainfrom
s3-sigv4-multipart
Jun 24, 2026
Merged

wolfdisk/s3: real SigV4 auth + multipart, copy, batch delete, range, metadata#4
wolfsoftwaresystemsltd merged 1 commit into
mainfrom
s3-sigv4-multipart

Conversation

@wolfsoftwaresystemsltd

Copy link
Copy Markdown
Owner

Summary

Brings the WolfDisk S3 gateway (wolfdisk/src/s3/) up to real S3 compatibility and fixes a startup panic.

Two pre-existing bugs fixed:

  • The S3 server panicked on boot when [s3] enabled = true — the catch-all route used axum-0.8 syntax (/{*path}) but the crate resolves to axum 0.7 (/*path). Unnoticed because s3.enabled defaults to false.
  • Auth only string-matched the access key — the SigV4 signature/secret were never verified, so anyone knowing the (non-secret) access key was "authenticated".

New capabilities:

  • Real AWS SigV4 verification — header-based + presigned, constant-time compare, clock-skew check, and aws-chunked (streaming) body decoding. Verified against the AWS-published example vector. No credentials configured still means "allow all" (unchanged private-cluster behaviour).
  • Multipart upload (Create/UploadPart/Complete/Abort), CopyObject, DeleteObjects (batch), Range GET (206), correct MD5 ETags (<md5>-N for multipart), Content-Type and x-amz-meta-* user metadata.
  • Reference-counted chunk deletion in S3 delete/overwrite paths — deleting one of two identical (deduplicated) objects no longer corrupts the other.

Blast radius

Contained to src/s3/* plus a 2-line wiring change in main.rs. No changes to FileEntry, FUSE, replication, or the index/wire format. S3 object metadata is kept in a node-local sidecar (<data_dir>/index/s3_meta.json), consistent with how the existing symlink_target field is not replicated.

Testing

  • 12 unit tests (SigV4 vector, aws-chunked decode, XML parsing, constant-time compare).
  • Live end-to-end against the real wolfdisk mount binary driven by an independent stdlib SigV4 client: every operation incl. multipart, copy, batch delete, range, metadata, streaming upload, bad-signature → 403, and dedup-safe delete/overwrite/abort.
  • Bidirectional FUSE↔S3 visibility verified on a shared data dir.
  • cargo build + cargo clippy clean on the new code.

Known limitations (documented)

  • Single PUT / part bodies buffer in memory up to 512 MB (use multipart for larger).
  • Per-chunk streaming signatures aren't individually re-verified (request is authenticated by its seed signature).
  • ACLs, versioning, and bucket policies are not implemented.

Built by CodeWolf & Wolf Software Systems Ltd

…metadata

The S3 gateway previously only string-matched the access key (no signature
verification) and supported a basic object/bucket set. It also failed to start:
the catch-all route used axum-0.8 syntax (`/{*path}`) while the crate resolves to
axum 0.7, so enabling `[s3]` panicked on boot. This went unnoticed because
`s3.enabled` defaults to false.

Changes are contained to src/s3/* plus a 2-line wiring change in main.rs; no
changes to FileEntry, FUSE, replication, or the index/wire format.

- auth.rs: full AWS Signature V4 verification (header-based and presigned query),
  constant-time signature comparison, clock-skew check, and aws-chunked
  (STREAMING-AWS4-HMAC-SHA256-PAYLOAD) body decoding. Verified against the
  AWS-published example vector. No credentials configured still means "allow all"
  (unchanged private-cluster behaviour).
- server.rs: fix the route to axum-0.7 `/*path`; add CopyObject, DeleteObjects
  (batch), multipart upload (Create/UploadPart/Complete/Abort), Range GET (206),
  correct MD5 ETags (and `<md5>-N` for multipart), Content-Type, and
  `x-amz-meta-*` user metadata. DeleteBucket now sweeps synthetic intermediate
  directories so a bucket with no objects is deletable.
- Reference-counted chunk deletion in all S3 delete/overwrite paths: deleting one
  of two identical (deduplicated) objects no longer corrupts the other, and an
  in-flight multipart part's chunks are never GC'd.
- meta.rs: self-contained S3 metadata sidecar (Content-Type/ETag/user-metadata,
  persisted to <data_dir>/index/s3_meta.json) and in-memory multipart registry,
  plus tolerant XML parsing for the Complete/Delete request bodies. S3 metadata
  is node-local (not carried by the replication wire), consistent with the
  existing symlink_target field.

Tested: 12 unit tests (SigV4 vector, aws-chunked, XML parsing, ct-compare); live
end-to-end against the real `wolfdisk mount` binary with an independent stdlib
SigV4 client (all operations incl. multipart, copy, batch delete, range,
metadata, streaming upload, bad-signature rejection, dedup-safe delete);
bidirectional FUSE<->S3 visibility on a shared data dir. cargo build + clippy
clean on the new code.

Known limitations (documented): single-PUT/part bodies buffer in memory up to
512MB (use multipart for larger); per-chunk streaming signatures are not
individually re-verified (request authenticated by its seed signature); ACLs,
versioning and bucket policies are not implemented.

Co-Authored-By: CodeWolf <paul@wolf.uk.com>
Co-Authored-By: Wolf Software Systems Ltd <paul@wolf.uk.com>
@wolfsoftwaresystemsltd wolfsoftwaresystemsltd merged commit caff859 into main Jun 24, 2026
1 check passed
@wolfsoftwaresystemsltd wolfsoftwaresystemsltd deleted the s3-sigv4-multipart branch June 24, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant