Skip to content

UFAL/CLARIN-DSpace upgrade v9#1339

Open
milanmajchrak wants to merge 13 commits into
dtq-dev-9-basefrom
ufal/clarin-dspace-upgrade-v9
Open

UFAL/CLARIN-DSpace upgrade v9#1339
milanmajchrak wants to merge 13 commits into
dtq-dev-9-basefrom
ufal/clarin-dspace-upgrade-v9

Conversation

@milanmajchrak

Copy link
Copy Markdown
Collaborator

Problem description

Analysis

(Write here, if there is needed describe some specific problem. Erase it, when it is not needed.)

Problems

(Write here, if some unexpected problems occur during solving issues. Erase it, when it is not needed.)

Manual Testing (if applicable)

Copilot review

  • Requested review from Copilot

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 910acf8f-8a40-4e9d-9fd9-c0ce04046e64

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

milanmajchrak and others added 3 commits June 18, 2026 13:29
…vices)

First foundation tranche of the CLARIN-DSpace 7.6.5 -> 9.3 forward-port.

What's included (compiles, checkstyle + license headers clean, entity/schema
validated on h2 via hbm2ddl=validate, migrations applied at DB init):

- 19 CLARIN Flyway migrations (h2 x9, postgres x10) at original version numbers
  (Lindat schema, preview/report/matomo/clarin_token tables, share token,
  default licenses, 7z format). Purely additive.
- ~107 CLARIN dspace-api classes: license framework, user metadata/registration,
  verification + clarin tokens, item/workspace services, handle service +
  external handle, shibboleth headers, featured services, provenance, EPIC/PID
  base, ORCID caching, factories + DAOs.
- v9 API adaptation of ported code: javax.* -> jakarta.*, commons-lang -> lang3,
  NullArgumentException -> IllegalArgumentException, Hibernate 6 ORDINAL enum
  JdbcTypeCode fix (ClarinLicense.confirmation).
- 9 CLARIN entity <mapping> entries in hibernate.cfg.xml.
- dspace-api/pom.xml deps: matomo-java-tracker, nimbus-jose-jwt, itextpdf,
  jfree, zjsonpatch.
- Minimal additions to vanilla files: Handle (url/dead/deadSince),
  Util.formatNetId, HandlePlugin.getRepositoryName/getCanonicalHandlePrefix.

Deferred to later tranches (tracked in CLARIN_DSPACE_V9_PROGRESS.md): S3/sync
storage (AWS SDK v1->v2), preview, report/health/diff, matomo runtime,
PID/EPIC clients, versioning CLI, Spring bean wiring, REST layer, frontend.

See CLARIN_DSPACE_V9_PROGRESS.md for full inventory, status and testing evidence.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tranche 2 of the CLARIN-DSpace 7.6.5 -> 9.3 backend port. Registers the data-layer
services/DAOs/factories ported in the previous commit so they are instantiated by
the DSpace Spring context.

- core-factory-services.xml: clarinServiceFactory, handleClarinServiceFactory
- core-dao-services.xml: 11 CLARIN DAO beans (license/label/mapping/user-registration/
  user-metadata/allowance/item/verification-token/matomo-report/token + HandleClarinDAOImpl)
- core-services.xml: 16 service beans (license framework, user metadata/registration,
  verification + clarin tokens, item/workspace, matomo report subscription,
  DspaceObjectClarin, AuthorizationBitstreamUtils, HandleClarinServiceImpl,
  EpicHandleServiceImpl, ProvenanceServiceImpl) + MatomoTracker wired to the existing
  v9 ${matomo.tracker.url} (factory requires it via @Autowired).

Deferred-feature beans (bitstream-sync/preview/report/matomo-runtime) intentionally
omitted until those classes are ported (tracked in CLARIN_DSPACE_V9_PROGRESS.md).

Validated: full Spring context boots cleanly (AccessStatusServiceTest 3/3, no
autowiring/bean-creation errors) with all CLARIN beans active.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…egy)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@milanmajchrak milanmajchrak changed the base branch from dtq-dev to dtq-dev-9-base June 19, 2026 07:02
milanmajchrak and others added 10 commits June 22, 2026 15:40
…ported items

From the independent BE review:
- M2: handle/PIDService.java reflectively loads the deferred PIDServiceEPICv2
  (Class.forName) -> ClassNotFoundException at runtime for its only path. Deferred it to
  _deferred/ (the live EPIC path is EpicHandleServiceImpl). dspace-api still compiles (EXIT 0).
- HONESTY (progress file §6g + §0): the BE port is a COMPILE-ONLY data/service skeleton, not a
  runtime-functional CLARIN backend. Documented all previously-untracked silently-not-ported items
  distinct from _deferred/: 24 CLARIN config files, 57 config modifications, 91 vanilla-.java
  modifications (only Handle/Util/HandlePlugin applied), 44 CLARIN test classes, 3 entity-mapping
  gaps (WorkspaceItem.shareToken, EPerson.welcomeInfo/canEditSubmissionMetadata, Item.isHidden),
  REST/OAI layer. Downgraded §0 "runtime-active" claim. Added §6f Docker/Playwright startup + blockers.

Nothing is skipped without a documented reason. See CLARIN_DSPACE_V9_PROGRESS.md §6f/§6g.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BE tranche 3 — closes documented gaps from the independent review:
- 24 ADDED CLARIN config files: clarin-dspace.cfg, OAI crosswalks (lindat_cmdi/olac/
  metasharev2/elg/datacite_openaire/bibtex .xsl + descriptions), email templates
  (clarin_token, clarin_download_link(+admin), clarin_autoregistration, share_submission,
  matomo_report, report_diff), registries (datacite.xml, edm.xml, metashare-schema.xml),
  submission-forms_cs.xml, default_cs.license, features/enable-orcid.cfg, spiders, VERSION_D.
- Entity columns matching migrations (were unmapped → features inert): WorkspaceItem.shareToken
  (share-submission), EPerson.welcomeInfo + canEditSubmissionMetadata. dspace-api compiles (EXIT 0).

Still TODO (documented §6g): 57 MODIFIED config files (dspace.cfg include of clarin-dspace.cfg,
item-submission.xml, authentication-shibboleth.cfg, discovery.xml), remaining vanilla-file method
additions + their service wiring, REST/OAI layer, CLARIN tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BE tranche 4 — first slice of the REST layer (the #1 functional blocker). Exposes the
/server/api/core/clarinlicenses, clarinlicenselabels, clarinlicenseresourcemappings,
clarinlruallowances endpoints the FE clarin-licenses module already consumes.

- 17 files: Rest models (ClarinLicense/Label/ResourceMapping/ResourceUserAllowance) +
  hateoas Resources + Converters + RestRepositories + ClarinLicenseNotFoundException.
- v9 adaptation: implement new RestModel.getTypePlural() (returns NAME+"s", matching the FE
  link names clarinlicenses/clarinlicenselabels/...); javax->jakarta; import order.
- Compiles (dspace-server-webapp EXIT 0) + checkstyle clean. Wires to the already-ported
  dspace-api ClarinLicense* services/DAOs (tranches 1-2).

Remaining REST (documented §6g): handle/epic-handle, user-metadata/registration, token,
import controllers, submission steps, OAI/CMDI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BE tranche 5 — more of the REST layer. Exposes endpoints for handle, epic-handle,
user-metadata, user-registration, verification-token, featured-service + their link
repositories (CLRUA<->user-metadata/registration, user-registration<->license,
resource-mapping<->license).

- 31 files: Rest models + hateoas Resources + Converters + RestRepositories + LinkRepositories.
- v9 adaptations: RestModel.getTypePlural() on all new models; javax->jakarta; import order.
- Compiles (dspace-server-webapp EXIT 0) + checkstyle clean.
- Deferred: ExternalHandleRestRepository (magic-link external handle; needs a RandomStringGenerator
  bean + DCDate.getCurrent() + external-handle flow not yet ported) -> _deferred/.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e validation

BE tranche 6 — RUNTIME-VALIDATED bug fix found by actually booting the server.

v9's REST framework resolves repositories via getBean(category + "." + modelPlural)
(Utils.getResourceRepositoryByCategoryAndModel uses the plural URL segment). CLARIN repos
were registered with the 7.x singular pattern @component(CATEGORY + "." + NAME), so EVERY
CLARIN endpoint returned 404 even though the classes compiled and CI was green.

Fix: add PLURAL_NAME = NAME + "s" to 8 Rest models (ClarinLicense/Label/ResourceMapping/
ResourceUserAllowance, ClarinUserMetadata/UserRegistration/VerificationToken, Handle) and switch
the 8 RestRepositories' @component to PLURAL_NAME (matches the FE data-service link names and
getTypePlural()). compile + checkstyle clean.

Validated on a running v9.3 server against real Postgres (full mvn package -> ant fresh_install ->
dspace database migrate [all CLARIN migrations apply on postgres] -> server-boot.jar):
  /server/api/core/clarinlicenses|clarinlicenselabels|handles -> 200 (valid HAL)
  /server/api/core/clarinlruallowances|clarinusermetadatas    -> 401 (correctly auth-protected)
This is exactly the "compiles != works" gap the independent review warned about.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ful to 7.x)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…— runtime-diagnosed

BE tranche 7 — second runtime-diagnosed bug, found by exercising the write path on a live server.

v9's ConverterService.toRest() enforces access on every BaseObjectRest by reading the @PreAuthorize
SpEL from the resource repository's most-derived findOne() (getPreAuthorizeAnnotationForBaseObject ->
getAnnotationForRestObject). The base DSpaceRestRepository.findOne is abstract with no annotation, and
the CLARIN repos' findOne overrides had none (faithful to 7.x, where ConverterService did NOT do this
check). Result: converting ANY real CLARIN object threw "IllegalArgumentException: 'expressionString'
must not be null or blank" -> 400/500. The empty GETs passed earlier only because there was no row to
convert; the bug surfaced on POST (create) and would hit every non-empty response.

Fix (v9-migration necessity, mirrors vanilla repos like CommunityRestRepository.findOne): add
@PreAuthorize on findOne of the 8 CLARIN BaseObjectRest repositories, with the access level matching
each endpoint's observed findAll semantics:
  permitAll()            -> ClarinLicense, ClarinLicenseLabel, ClarinLicenseResourceMapping, Handle
  hasAuthority('ADMIN')  -> ClarinLicenseResourceUserAllowance, ClarinUserMetadata,
                            ClarinUserRegistration, ClarinVerificationToken
compile + checkstyle clean. Read endpoints already proven 200/401 on a live v9 server against Postgres
(tranche 6); this unblocks object conversion / the create+read round trip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n blocked by host RAM

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…featured REST

URGENT FIX: tranche 7 (98771fe) pushed broken code — CI red. Root cause: 7 of the 8 CLARIN
repos ALREADY had @PreAuthorize on findOne in 7.x (only ClarinLicenseLabel lacked it, which was
the real ConverterService gap). My tranche-7 script's fallback regex appended a SECOND @PreAuthorize
to those 7 -> "PreAuthorize is not a repeatable annotation type" (a stale incremental compile masked
it locally; lesson: verify with `clean compile`). Fix: drop the duplicate, restore each repo's
ORIGINAL findOne annotation (permitAll() for license/label/mapping/handle/verificationtoken;
hasAuthority('AUTHENTICATED') for resourceUserAllowance/userMetadata/userRegistration). ClarinLicenseLabel
keeps the single permitAll() I added (it genuinely had none — the actual fix that unblocks conversion).

Also completes the handle feature REST (deps now available):
- RandomStringGenerator + RandomStringGeneratorImpl
- ExternalHandleRestRepository (un-deferred; DCDate(new Date()) -> DCDate.getCurrent() for v9)
- EpicHandleRestController, ClarinFeaturedServiceRestRepository
javax->jakarta, import order. clean compile + checkstyle BUILD SUCCESS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…step REST (DSpace 9)

BE tranche 8 — 26 more REST files, clean-compile + checkstyle verified (used `clean compile`
this time after the tranche-7 stale-incremental lesson):
- ConfigFile REST (controller/converter/model/hateoas + 3 exceptions) — runtime config file access
- authorization: CanManageLicense feature + ClarinAuthorizationTestController
- ClarinAutoRegistrationController, ClarinUserInfoController
- ClarinLicenseImportController, ClarinHandleImportController (bulk import)
- submission steps: ClarinLicenseDistributionStep/ResourceStep/SubmissionUtils/NoticeStep + 2 validations
- model: refbox DTOs (RefBox/FeaturedService/FeaturedServiceLink/ExportFormat), ClarinDataLicense,
  ShareSubmissionLinkDTO; utils/BigMultipartFile
- javax.mail -> jakarta.mail, javax.* -> jakarta.*, import order. No deferred-service refs; beans wire.

DEFERRED (need prerequisite work, moved to _deferred/): 14 files that require unported vanilla-file
methods (Item.isHidden, Util.replaceLast/normalizeDiscoverQuery, WorkspaceItem share helpers) and/or
extra libs not in pom (org.json.simple, com.hp.hpl.jena RDF): ClarinUserMetadataRestController,
ClarinItemImportController, ClarinEPersonImportController, ClarinUserMetadataImportController,
ClarinGroupRestController, SubmissionController, SuggestionRestController, ClarinRefBoxController,
ClarinShibbolethLoginFilter, SolrOAIReindexer, AuthrnRest/AuthrnResource/AuthorizationRestController,
DBConnectionStatisticsController. To be ported in a later dependency-complete tranche.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant