Skip to content

refactor: replace insecure custom downloader with Down::NetHttp#260

Merged
alexskr merged 9 commits into
developfrom
refactor/file-download-and-copy
Jun 26, 2026
Merged

refactor: replace insecure custom downloader with Down::NetHttp#260
alexskr merged 9 commits into
developfrom
refactor/file-download-and-copy

Conversation

@alexskr

@alexskr alexskr commented Nov 14, 2025

Copy link
Copy Markdown
Member

refactor: replace insecure custom downloader with Down::NetHttp

Downloading

  • Replace the hand-rolled Net::HTTP/Net::FTP downloader with Down::NetHttp
  • Drop FTP support; download_file and remote_file_exists? now accept http/https only
  • Rework remote_file_exists? into FileHelpers (Down-based; non-http(s) returns false)
  • Add a configurable download size cap via the download_max_file_size setting
    (default 2 GB), enforced by Down; previously unbounded
  • Filename resolution now relies on Down's Content-Disposition / URL parsing

Filename hardening

  • Shared FileHelpers.sanitize_filename: strip control/unsafe chars (incl. NUL),
    remove leading dots, trim/collapse spaces, 255-char cap, fallback to "unnamed"
  • Add dedicated unit tests for sanitize_filename

Repository copy path (OntologySubmission.copy_file_repository)

  • Normalize src (accept Tempfile or String path) and validate existence
  • Atomic write: copy to a temp file then rename into place; clean up on failure

Dependencies

  • Add down; remove net-ftp
  • Move rack-test from a runtime to a development dependency
  • Replace thin with webrick for the test Rack server

 - drop ftp support
 - Harden repository copy path (OntologySubmission.copy_file_repository)
 - Normalize src (accept Tempfile or String path)
 - Sanitize destination filename (strip control/unsafe chars, trim/collapse spaces, remove leading dots, 255-char cap)
@codecov

codecov Bot commented Nov 14, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.04651% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.36%. Comparing base (c6f83eb) to head (a947052).

Files with missing lines Patch % Lines
...tologies_linked_data/models/ontology_submission.rb 83.33% 3 Missing ⚠️
lib/ontologies_linked_data/utils/file.rb 88.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #260      +/-   ##
===========================================
+ Coverage    80.96%   81.36%   +0.40%     
===========================================
  Files          101      101              
  Lines         6892     6848      -44     
===========================================
- Hits          5580     5572       -8     
+ Misses        1312     1276      -36     
Flag Coverage Δ
ag 81.25% <86.04%> (+0.35%) ⬆️
fs 81.33% <86.04%> (+0.38%) ⬆️
gd 81.23% <86.04%> (+0.35%) ⬆️
unittests 81.36% <86.04%> (+0.40%) ⬆️
vo 81.27% <86.04%> (+0.38%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexskr alexskr changed the base branch from master to develop November 14, 2025 17:25
…ace Thin in tests

- Remove all FTP download logic (check_ftp_file, URI::FTP branch, net-ftp dependency)
- Move remote_file_exists? to LinkedData::Utils::FileHelpers and reimplement using Down::NetHttp
- Update download_file to use max_redirects keyword for consistency
- Remove Thin dependency and add Webrick for test environment
@alexskr alexskr marked this pull request as ready for review November 20, 2025 21:33
@alexskr alexskr requested a review from mdorf November 20, 2025 21:40
@alexskr alexskr marked this pull request as draft November 20, 2025 22:09
@alexskr alexskr marked this pull request as ready for review November 21, 2025 20:20
alexskr added 6 commits June 23, 2026 12:34
The atomic temp-write refactor renamed dst to dst_final/dst_tmp but two
error-path messages still interpolated #{dst}, which would raise NameError
and mask the real failure in both the rescue branch and the post-copy
sanity check.
File.basename raises ArgumentError on a NUL byte, so sanitize_filename
crashed when given a name containing a control byte (reachable via an
HTTP Content-Disposition filename). Strip control chars before calling
File.basename so the helper sanitizes rather than raises.

Add dedicated unit tests in test/util/test_file_helpers.rb covering
normal names, unnamed fallback, path/traversal stripping, control and
unsafe char removal, whitespace collapsing, and the 255-char cap.
Down::NetHttp.download returns the tempfile already opened at offset 0
(ensure_tempfile finishes with tempfile.open), so the explicit rewind
was a no-op.
Replace the hardcoded 512 MB cap with a download_max_file_size setting
(default 2 GB) so ops can raise the limit without a code change. The
keyword arg still overrides per-call; falls back to the setting when nil.
Down's original_filename already parses Content-Disposition and falls
back to the post-redirect URL path basename, and sanitize_filename maps
nil/empty to "unnamed". The old || chain's last_iri_fragment branch was
unreachable (File.basename never returns nil), so collapse it to a single
sanitize_filename(tmpfile.original_filename) call.
…load-and-copy

# Conflicts:
#	Gemfile
#	Gemfile.lock
#	lib/ontologies_linked_data/models/ontology_submission.rb
#	lib/ontologies_linked_data/utils/file.rb
#	test/models/test_ontology_submission.rb
@alexskr alexskr merged commit a8a8dfb into develop Jun 26, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant