UFAL/CLARIN-DSpace upgrade v9#1339
Open
milanmajchrak wants to merge 13 commits into
Open
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…vices) First foundation tranche of the CLARIN-DSpace 7.6.5 -> 9.3 forward-port. What's included (compiles, checkstyle + license headers clean, entity/schema validated on h2 via hbm2ddl=validate, migrations applied at DB init): - 19 CLARIN Flyway migrations (h2 x9, postgres x10) at original version numbers (Lindat schema, preview/report/matomo/clarin_token tables, share token, default licenses, 7z format). Purely additive. - ~107 CLARIN dspace-api classes: license framework, user metadata/registration, verification + clarin tokens, item/workspace services, handle service + external handle, shibboleth headers, featured services, provenance, EPIC/PID base, ORCID caching, factories + DAOs. - v9 API adaptation of ported code: javax.* -> jakarta.*, commons-lang -> lang3, NullArgumentException -> IllegalArgumentException, Hibernate 6 ORDINAL enum JdbcTypeCode fix (ClarinLicense.confirmation). - 9 CLARIN entity <mapping> entries in hibernate.cfg.xml. - dspace-api/pom.xml deps: matomo-java-tracker, nimbus-jose-jwt, itextpdf, jfree, zjsonpatch. - Minimal additions to vanilla files: Handle (url/dead/deadSince), Util.formatNetId, HandlePlugin.getRepositoryName/getCanonicalHandlePrefix. Deferred to later tranches (tracked in CLARIN_DSPACE_V9_PROGRESS.md): S3/sync storage (AWS SDK v1->v2), preview, report/health/diff, matomo runtime, PID/EPIC clients, versioning CLI, Spring bean wiring, REST layer, frontend. See CLARIN_DSPACE_V9_PROGRESS.md for full inventory, status and testing evidence. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tranche 2 of the CLARIN-DSpace 7.6.5 -> 9.3 backend port. Registers the data-layer
services/DAOs/factories ported in the previous commit so they are instantiated by
the DSpace Spring context.
- core-factory-services.xml: clarinServiceFactory, handleClarinServiceFactory
- core-dao-services.xml: 11 CLARIN DAO beans (license/label/mapping/user-registration/
user-metadata/allowance/item/verification-token/matomo-report/token + HandleClarinDAOImpl)
- core-services.xml: 16 service beans (license framework, user metadata/registration,
verification + clarin tokens, item/workspace, matomo report subscription,
DspaceObjectClarin, AuthorizationBitstreamUtils, HandleClarinServiceImpl,
EpicHandleServiceImpl, ProvenanceServiceImpl) + MatomoTracker wired to the existing
v9 ${matomo.tracker.url} (factory requires it via @Autowired).
Deferred-feature beans (bitstream-sync/preview/report/matomo-runtime) intentionally
omitted until those classes are ported (tracked in CLARIN_DSPACE_V9_PROGRESS.md).
Validated: full Spring context boots cleanly (AccessStatusServiceTest 3/3, no
autowiring/bean-creation errors) with all CLARIN beans active.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…egy) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ported items From the independent BE review: - M2: handle/PIDService.java reflectively loads the deferred PIDServiceEPICv2 (Class.forName) -> ClassNotFoundException at runtime for its only path. Deferred it to _deferred/ (the live EPIC path is EpicHandleServiceImpl). dspace-api still compiles (EXIT 0). - HONESTY (progress file §6g + §0): the BE port is a COMPILE-ONLY data/service skeleton, not a runtime-functional CLARIN backend. Documented all previously-untracked silently-not-ported items distinct from _deferred/: 24 CLARIN config files, 57 config modifications, 91 vanilla-.java modifications (only Handle/Util/HandlePlugin applied), 44 CLARIN test classes, 3 entity-mapping gaps (WorkspaceItem.shareToken, EPerson.welcomeInfo/canEditSubmissionMetadata, Item.isHidden), REST/OAI layer. Downgraded §0 "runtime-active" claim. Added §6f Docker/Playwright startup + blockers. Nothing is skipped without a documented reason. See CLARIN_DSPACE_V9_PROGRESS.md §6f/§6g. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BE tranche 3 — closes documented gaps from the independent review: - 24 ADDED CLARIN config files: clarin-dspace.cfg, OAI crosswalks (lindat_cmdi/olac/ metasharev2/elg/datacite_openaire/bibtex .xsl + descriptions), email templates (clarin_token, clarin_download_link(+admin), clarin_autoregistration, share_submission, matomo_report, report_diff), registries (datacite.xml, edm.xml, metashare-schema.xml), submission-forms_cs.xml, default_cs.license, features/enable-orcid.cfg, spiders, VERSION_D. - Entity columns matching migrations (were unmapped → features inert): WorkspaceItem.shareToken (share-submission), EPerson.welcomeInfo + canEditSubmissionMetadata. dspace-api compiles (EXIT 0). Still TODO (documented §6g): 57 MODIFIED config files (dspace.cfg include of clarin-dspace.cfg, item-submission.xml, authentication-shibboleth.cfg, discovery.xml), remaining vanilla-file method additions + their service wiring, REST/OAI layer, CLARIN tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BE tranche 4 — first slice of the REST layer (the #1 functional blocker). Exposes the /server/api/core/clarinlicenses, clarinlicenselabels, clarinlicenseresourcemappings, clarinlruallowances endpoints the FE clarin-licenses module already consumes. - 17 files: Rest models (ClarinLicense/Label/ResourceMapping/ResourceUserAllowance) + hateoas Resources + Converters + RestRepositories + ClarinLicenseNotFoundException. - v9 adaptation: implement new RestModel.getTypePlural() (returns NAME+"s", matching the FE link names clarinlicenses/clarinlicenselabels/...); javax->jakarta; import order. - Compiles (dspace-server-webapp EXIT 0) + checkstyle clean. Wires to the already-ported dspace-api ClarinLicense* services/DAOs (tranches 1-2). Remaining REST (documented §6g): handle/epic-handle, user-metadata/registration, token, import controllers, submission steps, OAI/CMDI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BE tranche 5 — more of the REST layer. Exposes endpoints for handle, epic-handle, user-metadata, user-registration, verification-token, featured-service + their link repositories (CLRUA<->user-metadata/registration, user-registration<->license, resource-mapping<->license). - 31 files: Rest models + hateoas Resources + Converters + RestRepositories + LinkRepositories. - v9 adaptations: RestModel.getTypePlural() on all new models; javax->jakarta; import order. - Compiles (dspace-server-webapp EXIT 0) + checkstyle clean. - Deferred: ExternalHandleRestRepository (magic-link external handle; needs a RandomStringGenerator bean + DCDate.getCurrent() + external-handle flow not yet ported) -> _deferred/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e validation BE tranche 6 — RUNTIME-VALIDATED bug fix found by actually booting the server. v9's REST framework resolves repositories via getBean(category + "." + modelPlural) (Utils.getResourceRepositoryByCategoryAndModel uses the plural URL segment). CLARIN repos were registered with the 7.x singular pattern @component(CATEGORY + "." + NAME), so EVERY CLARIN endpoint returned 404 even though the classes compiled and CI was green. Fix: add PLURAL_NAME = NAME + "s" to 8 Rest models (ClarinLicense/Label/ResourceMapping/ ResourceUserAllowance, ClarinUserMetadata/UserRegistration/VerificationToken, Handle) and switch the 8 RestRepositories' @component to PLURAL_NAME (matches the FE data-service link names and getTypePlural()). compile + checkstyle clean. Validated on a running v9.3 server against real Postgres (full mvn package -> ant fresh_install -> dspace database migrate [all CLARIN migrations apply on postgres] -> server-boot.jar): /server/api/core/clarinlicenses|clarinlicenselabels|handles -> 200 (valid HAL) /server/api/core/clarinlruallowances|clarinusermetadatas -> 401 (correctly auth-protected) This is exactly the "compiles != works" gap the independent review warned about. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ful to 7.x) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…— runtime-diagnosed BE tranche 7 — second runtime-diagnosed bug, found by exercising the write path on a live server. v9's ConverterService.toRest() enforces access on every BaseObjectRest by reading the @PreAuthorize SpEL from the resource repository's most-derived findOne() (getPreAuthorizeAnnotationForBaseObject -> getAnnotationForRestObject). The base DSpaceRestRepository.findOne is abstract with no annotation, and the CLARIN repos' findOne overrides had none (faithful to 7.x, where ConverterService did NOT do this check). Result: converting ANY real CLARIN object threw "IllegalArgumentException: 'expressionString' must not be null or blank" -> 400/500. The empty GETs passed earlier only because there was no row to convert; the bug surfaced on POST (create) and would hit every non-empty response. Fix (v9-migration necessity, mirrors vanilla repos like CommunityRestRepository.findOne): add @PreAuthorize on findOne of the 8 CLARIN BaseObjectRest repositories, with the access level matching each endpoint's observed findAll semantics: permitAll() -> ClarinLicense, ClarinLicenseLabel, ClarinLicenseResourceMapping, Handle hasAuthority('ADMIN') -> ClarinLicenseResourceUserAllowance, ClarinUserMetadata, ClarinUserRegistration, ClarinVerificationToken compile + checkstyle clean. Read endpoints already proven 200/401 on a live v9 server against Postgres (tranche 6); this unblocks object conversion / the create+read round trip. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n blocked by host RAM Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…featured REST URGENT FIX: tranche 7 (98771fe) pushed broken code — CI red. Root cause: 7 of the 8 CLARIN repos ALREADY had @PreAuthorize on findOne in 7.x (only ClarinLicenseLabel lacked it, which was the real ConverterService gap). My tranche-7 script's fallback regex appended a SECOND @PreAuthorize to those 7 -> "PreAuthorize is not a repeatable annotation type" (a stale incremental compile masked it locally; lesson: verify with `clean compile`). Fix: drop the duplicate, restore each repo's ORIGINAL findOne annotation (permitAll() for license/label/mapping/handle/verificationtoken; hasAuthority('AUTHENTICATED') for resourceUserAllowance/userMetadata/userRegistration). ClarinLicenseLabel keeps the single permitAll() I added (it genuinely had none — the actual fix that unblocks conversion). Also completes the handle feature REST (deps now available): - RandomStringGenerator + RandomStringGeneratorImpl - ExternalHandleRestRepository (un-deferred; DCDate(new Date()) -> DCDate.getCurrent() for v9) - EpicHandleRestController, ClarinFeaturedServiceRestRepository javax->jakarta, import order. clean compile + checkstyle BUILD SUCCESS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…step REST (DSpace 9) BE tranche 8 — 26 more REST files, clean-compile + checkstyle verified (used `clean compile` this time after the tranche-7 stale-incremental lesson): - ConfigFile REST (controller/converter/model/hateoas + 3 exceptions) — runtime config file access - authorization: CanManageLicense feature + ClarinAuthorizationTestController - ClarinAutoRegistrationController, ClarinUserInfoController - ClarinLicenseImportController, ClarinHandleImportController (bulk import) - submission steps: ClarinLicenseDistributionStep/ResourceStep/SubmissionUtils/NoticeStep + 2 validations - model: refbox DTOs (RefBox/FeaturedService/FeaturedServiceLink/ExportFormat), ClarinDataLicense, ShareSubmissionLinkDTO; utils/BigMultipartFile - javax.mail -> jakarta.mail, javax.* -> jakarta.*, import order. No deferred-service refs; beans wire. DEFERRED (need prerequisite work, moved to _deferred/): 14 files that require unported vanilla-file methods (Item.isHidden, Util.replaceLast/normalizeDiscoverQuery, WorkspaceItem share helpers) and/or extra libs not in pom (org.json.simple, com.hp.hpl.jena RDF): ClarinUserMetadataRestController, ClarinItemImportController, ClarinEPersonImportController, ClarinUserMetadataImportController, ClarinGroupRestController, SubmissionController, SuggestionRestController, ClarinRefBoxController, ClarinShibbolethLoginFilter, SolrOAIReindexer, AuthrnRest/AuthrnResource/AuthorizationRestController, DBConnectionStatisticsController. To be ported in a later dependency-complete tranche. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem description
Analysis
(Write here, if there is needed describe some specific problem. Erase it, when it is not needed.)
Problems
(Write here, if some unexpected problems occur during solving issues. Erase it, when it is not needed.)
Manual Testing (if applicable)
Copilot review