Skip to content

[Port to dtq-dev] Issue 1364: tgz file preview fix#1338

Merged
milanmajchrak merged 2 commits into
dataquest-dev:dtq-devfrom
ufal:backport-1372-to-dtq-dev
Jun 23, 2026
Merged

[Port to dtq-dev] Issue 1364: tgz file preview fix#1338
milanmajchrak merged 2 commits into
dataquest-dev:dtq-devfrom
ufal:backport-1372-to-dtq-dev

Conversation

@kosarko

@kosarko kosarko commented Jun 15, 2026

Copy link
Copy Markdown

Port of ufal#1372 by @kuchtiak-ufal to dtq-dev.

Summary by CodeRabbit

  • New Features

    • Added --force flag to regenerate file previews even when previews already exist
    • Improved support for .tgz compressed archive files in preview generation
    • Updated authentication to use email-based approach (password no longer required)
  • Tests

    • Added test coverage for .tgz file preview handling
    • Updated preview generation tests to reflect new authentication method

* Issue 1364: tgz file preview fix

* resolve MR comments + fixed test

* test cleanup

* Update help message for force preview option

* get rid of the password requirement on FilePreview script

---------

Co-authored-by: Ondřej Košarko <kosarko@ufal.mff.cuni.cz>
(cherry picked from commit 00a2a37)
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@kuchtiak-ufal, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 10 minutes and 25 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 14c67dfb-869d-4a27-a2b4-bb986bd95fae

📥 Commits

Reviewing files that changed from the base of the PR and between f293727 and fda8995.

📒 Files selected for processing (2)
  • dspace-api/src/main/java/org/dspace/scripts/filepreview/FilePreviewConfiguration.java
  • dspace-server-webapp/src/test/java/org/dspace/app/rest/PreviewContentServiceImplIT.java
📝 Walkthrough

Walkthrough

The PR fixes .tgz filename detection in processGzipFile so it routes to processTarGzipFile, removes password-based authentication from the file-preview script in favour of email-only user resolution, adds a -f/--force CLI option to delete and regenerate existing preview content, and updates integration tests accordingly.

Changes

.tgz Preview Fix

Layer / File(s) Summary
.tgz suffix fix and integration test
dspace-api/.../PreviewContentServiceImpl.java, dspace-server-webapp/.../PreviewContentServiceImplIT.java
processGzipFile condition expanded to match .tar.gz or .tgz; PreviewContentServiceImplIT adds a .tgz bitstream fixture and testTgzContentWithGzipMimetype test.

FilePreview --force Flag and Auth Refactor

Layer / File(s) Summary
CLI option config and field declarations
dspace-api/.../FilePreviewConfiguration.java, dspace-api/.../FilePreview.java
FilePreviewConfiguration drops --password and adds --force; FilePreview gains a force boolean field and epersonMail string with updated imports.
Auth refactor and --force regeneration logic
dspace-api/.../FilePreview.java
CLI parsing adds --force and drops password validation; internalRun resolves the context user via getEperson (email only, no password); preview loop deletes existing PreviewContent when force=true, otherwise skips; info log added at generation start.
Integration test refactor and --force test
dspace-api/src/test/java/org/dspace/scripts/filepreview/FilePreviewIT.java
Removes testUnauthorizedPassword and password parameter from runScriptForItemWithBitstreams; all call sites updated to email-only; checkHandlerMessages helper added; testPreviewWithForce verifies deletion and regeneration on a second run with -f.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • dataquest-dev/DSpace#960: Directly overlaps with the authentication refactor in FilePreview/FilePreviewConfiguration, removing password-based CLI options and adjusting the script's user-resolution approach.
  • dataquest-dev/DSpace#936: Also modifies the file-preview CLI authentication flow in FilePreview and FilePreviewConfiguration and updates FilePreviewIT, which is the same set of files changed here.

Suggested reviewers

  • milanmajchrak
  • vidiecan
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is minimal and lacks required template sections. It only provides a brief port reference without problem description, analysis, or manual testing information. Add Problem description, Analysis section, and Manual Testing checklist from the template to provide comprehensive context for reviewers.
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly relates to the main change: fixing tgz file preview functionality (Issue 1364) being ported to dtq-dev branch.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
dspace-api/src/test/java/org/dspace/scripts/filepreview/FilePreviewIT.java (1)

247-247: ⚡ Quick win

Avoid exact info-log count assertions in helper; they’re brittle.

hasSize(previewGenerationExpected ? 3 : 2) tightly couples tests to incidental logging and causes noisy failures (same pattern also appears in testPreviewWithForce, Line 209). Prefer asserting required messages only.

Suggested refactor
-        assertThat(messages, hasSize(previewGenerationExpected ? 3 : 2));
         assertThat(messages, hasItem(containsString("Generate the file previews for the specified item with " +
                 "the given UUID: " + item.getID())));
         assertThat(messages, hasItem(containsString("Authentication by user: " + user.getEmail())));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/filepreview/FilePreviewIT.java`
at line 247, The assertion using hasSize(previewGenerationExpected ? 3 : 2)
creates brittle tests that are tightly coupled to incidental logging details and
cause noisy failures when logging behavior changes. This pattern appears in both
the assertion at line 247 and in the testPreviewWithForce method at line 209.
Instead of asserting exact message counts, refactor both locations to assert
only the presence of required/essential log messages using containsInAnyOrder or
hasItems, which will make tests resilient to incidental logging changes and
focus on behavior that actually matters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dspace-api/src/main/java/org/dspace/scripts/filepreview/FilePreview.java`:
- Around line 206-209: The getEperson() method does not validate that
ePersonService.find() returned a non-null result, allowing null to be returned
and subsequently dereferenced at the call site, causing an NPE. After the
ePersonService.find(context, getEpersonIdentifier()) call in getEperson(), add a
null check and throw an AuthenticationException with a clear error message if
the result is null. This ensures fail-fast behavior and prevents null pointer
exceptions in downstream code that depends on getEperson() returning a valid
EPerson instance.
- Around line 89-90: The validation check in the condition at lines 89-90 only
validates that epersonMail is not null, but does not reject blank or
whitespace-only strings. Replace the null check with a blank-safe validation
method (such as StringUtils.isBlank()) to ensure that empty strings are also
rejected at parse-time when no eperson identifier is provided. This will enforce
a strict CLI contract and provide immediate error messaging instead of allowing
blank input to proceed and fail later.

---

Nitpick comments:
In `@dspace-api/src/test/java/org/dspace/scripts/filepreview/FilePreviewIT.java`:
- Line 247: The assertion using hasSize(previewGenerationExpected ? 3 : 2)
creates brittle tests that are tightly coupled to incidental logging details and
cause noisy failures when logging behavior changes. This pattern appears in both
the assertion at line 247 and in the testPreviewWithForce method at line 209.
Instead of asserting exact message counts, refactor both locations to assert
only the presence of required/essential log messages using containsInAnyOrder or
hasItems, which will make tests resilient to incidental logging changes and
focus on behavior that actually matters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69f90be9-f6ca-4374-84b5-013336e05bd7

📥 Commits

Reviewing files that changed from the base of the PR and between 3dc9cec and f293727.

📒 Files selected for processing (5)
  • dspace-api/src/main/java/org/dspace/content/PreviewContentServiceImpl.java
  • dspace-api/src/main/java/org/dspace/scripts/filepreview/FilePreview.java
  • dspace-api/src/main/java/org/dspace/scripts/filepreview/FilePreviewConfiguration.java
  • dspace-api/src/test/java/org/dspace/scripts/filepreview/FilePreviewIT.java
  • dspace-server-webapp/src/test/java/org/dspace/app/rest/PreviewContentServiceImplIT.java

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports Issue 1364 fix to dtq-dev by improving .tgz preview detection, adding a --force option to regenerate previews, and adjusting the file-preview script to run as an EPerson resolved by email (without password verification).

Changes:

  • Add .tgz handling to gzip/tar preview extraction logic.
  • Add -f/--force to delete existing preview content and regenerate it.
  • Update CLI script + integration tests to use email-only EPerson resolution (remove password option/usage).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
dspace-server-webapp/src/test/java/org/dspace/app/rest/PreviewContentServiceImplIT.java Adds a new integration test bitstream for .tgz with gzip MIME type and exercises preview generation.
dspace-api/src/test/java/org/dspace/scripts/filepreview/FilePreviewIT.java Updates script ITs for email-only invocation and adds --force test coverage.
dspace-api/src/main/java/org/dspace/scripts/filepreview/FilePreviewConfiguration.java Adds --force option and removes password option from CLI configuration.
dspace-api/src/main/java/org/dspace/scripts/filepreview/FilePreview.java Implements --force regeneration behavior and switches authentication resolution to EPerson lookup by email/identifier.
dspace-api/src/main/java/org/dspace/content/PreviewContentServiceImpl.java Treats .tgz filenames as tar+gzip when processing gzip content.

@milanmajchrak milanmajchrak merged commit 15b296a into dataquest-dev:dtq-dev Jun 23, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants