Skip to content

Add oversize submission detection and backward-compatible DB flag#73

Merged
bdc34 merged 2 commits into
developfrom
bdc34/SUBMISSION-5-oversize
Jun 10, 2026
Merged

Add oversize submission detection and backward-compatible DB flag#73
bdc34 merged 2 commits into
developfrom
bdc34/SUBMISSION-5-oversize

Conversation

@bdc34

@bdc34 bdc34 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Detect oversize submissions at file-upload time and persist the flag to the classic arXiv_submissions.is_oversize column. The flag is a soft gate: the submitter is warned but can still proceed. The auto-hold effect at finalize is a later phase.

Adds policy module:

  • submit_ce/domain/size_limits.py: SizeLimits (three per-archive limits, default 50 MB) and a pure check_sizes() decision function. Enforces the total and per-file uncompressed limits, matching legacy check_sizes; compressed limit defined but not enforced. OVERRIDE/MAXSIZE env escapes.
  • ui/config.py: MAX_UNCOMPRESSED_TOTAL_KB / _PER_FILE_KB / _COMPRESSED_KB.

Adds detection in the file Event:

  • Submission.is_oversize: flag set when files change.
  • UploadArchive/UploadFiles/RemoveFiles evaluate the authoritative workspace against the limits in execute(), persist the result, and apply it in project() (deterministic on replay); RemoveAllFiles clears it.
  • SubmitApi.get_size_limits() (defaults) + Flask override reading config, so the domain event reaches limits through the api boundary.
  • upload controller flashes an oversize warning.

Adds submit 1.5 backward-compatible DB projection:

  • update_from_submission writes arXiv_submissions.is_oversize; to_submission reads it back so rows round-trip.

Tests: size_limits unit tests, event detection tests, and UI/DB persistence tests (column write + domain round-trip).

STILL TODO: need to do an auto-hold on finalize

Co-Authored-By: Claude Opus 4.8 (1M context)

Detect oversize submissions at file-upload time and persist the flag to the
classic arXiv_submissions.is_oversize column. The flag is a soft gate: the
submitter is warned but can still proceed. The auto-hold effect at finalize is
a later phase.

Adds policy module:
- submit_ce/domain/size_limits.py: SizeLimits (three per-archive limits,
  default 50 MB) and a pure check_sizes() decision function. Enforces the
  total and per-file uncompressed limits, matching legacy check_sizes;
  compressed limit defined but not enforced. OVERRIDE/MAXSIZE env escapes.
- ui/config.py: MAX_UNCOMPRESSED_TOTAL_KB / _PER_FILE_KB / _COMPRESSED_KB.

Adds detection in the file `Event`:
- Submission.is_oversize: flag set when files change.
- UploadArchive/UploadFiles/RemoveFiles evaluate the authoritative
  workspace against the limits in execute(), persist the result, and apply
  it in project() (deterministic on replay); RemoveAllFiles clears it.
- SubmitApi.get_size_limits() (defaults) + Flask override reading config,
  so the domain event reaches limits through the api boundary.
- upload controller flashes an oversize warning.

Adds submit 1.5 backward-compatible DB projection:
- update_from_submission writes arXiv_submissions.is_oversize; to_submission
  reads it back so rows round-trip.

Tests: size_limits unit tests, event detection tests, and UI/DB
persistence tests (column write + domain round-trip).

Co-Authored-By: Claude Opus 4.8 (1M context)
if workspace is None:
return False
per_file = {file.path: file.bytes for file in workspace.files}
total = workspace.size or 0

@bmaltzan bmaltzan Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So per_file is the sum of file space used in src/
Is per_file the same as workspace.size or is the comment wrong?
or is workspace.size out of date?

class Workspace(BaseModel):
size: Optional[int] = None
"""Size in bytes of the uncompressed upload workspace."""

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per_file here is a dict of file.path -> file.bytes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace size should still be the uncompressed size of workspace.

The per_file is to detect single files that are over the size limit.

@bmaltzan bmaltzan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file size variable names were a little confusing, but otherwise looks fine.

@bdc34 bdc34 merged commit d26eb86 into develop Jun 10, 2026
1 check passed
@bdc34 bdc34 deleted the bdc34/SUBMISSION-5-oversize branch June 10, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants