feat(sitesearch): vendor-neutral Site Search — neutral aggregation + OpenSearch impl + phase-aware router (#35786)#36282
Conversation
Decouple SiteSearchAPI/SiteSearchWebAPI from Elasticsearch aggregation types so Site Search can be served by OpenSearch in Phase 3. - Reuse the existing neutral com.dotcms.content.index.domain.Aggregation / AggregationBucket DTOs (from #36026) instead of a new IndexAggregation - Add neutral DotSearchException (unchecked) to replace ElasticsearchException on the public API surface - SiteSearchAPI: drop org.elasticsearch.* imports; neutral Aggregation return type; createSiteSearchIndex throws DotSearchException - SiteSearchWebAPI: remove InternalDateHistogram/StringTerms/Bucket casts and the Joda DateTime import; getFacets distinguishes histogram vs terms by aggregation type and feeds the legacy wrappers neutral buckets - ESSiteSearchAPI: adapt ES results via Aggregation.from(); ES exception throws -> DotSearchException - Add date/numeric histogram support to the neutral Aggregation ES factory (also fixes a latent CCE: the old getFacets cast the histogram key to Joda DateTime, which is a java.time.ZonedDateTime in ES 7.x) OSSiteSearchAPI is deferred to #34609 (not yet in the codebase); Aggregation.fromOS() is already in place for it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#35786) Completes the vendor-neutral Site Search extraction begun in #35786 by adding the OpenSearch implementation and a phase-aware router, so Site Search dual-writes and reads correctly across the ES -> OS migration phases. - OSSiteSearchAPI: @ApplicationScoped @default OpenSearch implementation of SiteSearchAPI. Search/aggregations via the generic client -> ContentSearchResponse (mirrors OSSearchAPIImpl); doc put/delete via _doc PUT/DELETE; get via typed client.get(...). Default site-search index resolved from VersionedIndicesAPI (not the deprecated IndiciesAPI). Index names handled in logical space; the .os tag forced by VersionedIndicesAPI is stripped on read. - SiteSearchAPIImpl: PhaseRouter<SiteSearchAPI> router mirroring IndexAPIImpl and acting as the single fan-out point. Reads -> read provider; doc/index writes -> write fan-out; listIndices/listClosedIndices merge in dual-write; Quartz task methods route to a single provider (fan-out would double-schedule jobs). - ESSiteSearchAPI: use raw ESIndexAPI instead of the IndexAPI router so the SiteSearch router is the only fan-out point (avoids double dual-write). - APILocator: SITESEARCH_API now returns SiteSearchAPIImpl. - OSSiteSearchAPIIntegrationTest: lifecycle, doc round-trip, aggregations, and default-index activation; registered in OpenSearchUpgradeSuite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI (OpenSearch Upgrade Suite) failed: every OSSiteSearchAPIIntegrationTest that creates an index errored with "Failed to parse index settings". The OS impl was loading es-sitesearch-settings.json, whose ES-only token-filter syntax (edgeNGram, side) is rejected by the typed OpenSearch IndexSettings deserializer in OSIndexAPIImpl.createIndex. Add os-sitesearch-settings.json declaring the same analyzers (standard_content, partial_content, comma_analyzer) in OpenSearch syntax (edge_ngram, no side), and load it from OSSiteSearchAPI.createSiteSearchIndex. The mapping is vendor-neutral and reused as-is. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l index The aggregation IT failed: mimeType aggregation hit "Text fields are not optimised ... use a keyword field". Root cause: createSiteSearchIndex delegated the mapping PUT to MappingOperationsOS, which force-tags the physical name with `.os`. Site search uses untagged logical names, so the mapping landed on a different (`.os`) index while the real index kept the dynamic default mapping (string -> text), breaking keyword aggregations. Apply the mapping with a raw PUT /<index>/_mapping against the same untagged physical name used by createIndex/search/put, and drop the MappingOperationsOS dependency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…5786) Adds SiteSearchWebAPITest covering the view-tool surface affected by the neutral-aggregation refactor: search() (default-index, alias, pagination, empty and error paths) with full SiteSearchResults/SiteSearchResult field assertions; getAggregations() over the neutral Aggregation/AggregationBucket tree (terms, nested top_hits, numeric-histogram getKeyAsNumber); and getFacets() across all three legacy wrappers (string-terms, count-histogram, plain Facet fallback). Registered in MainSuite1b alongside ContentSearchToolTest. Also a minor List.getFirst() cleanup in SiteSearchAPIImpl. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…apping Two OpenSearch site-search regressions surfaced by the dual-write fan-out: 1. Shared mutable result across the fan-out. SiteSearchAPIImpl.putToIndex handed the same SiteSearchResult to both leaves. putToIndex mutates the backing map (setKeywords rewrites "keywords" String -> List), so the first leaf (ES) corrupted the input the second leaf (OS) then read, throwing ClassCastException: EmptyList cannot be cast to String and silently dropping every document from OpenSearch. The router now copies the result (and each element of the batch overload) per provider. 2. Mapping fan-out leak. ESSiteSearchAPI.createSiteSearchIndex applied its mapping through the phase-dispatched ESMappingAPIImpl.putMapping, which fanned out a second time to OpenSearch using a .os-tagged physical name that site-search OS indices never use -> HTTP 404. Pinned the ES leaf to IndexTag.ES, restoring the single-fan-out invariant (SiteSearchAPIImpl already drives OSSiteSearchAPI, which owns its own untagged OS index + mapping). Adds SiteSearchDualWriteRouterIT (registered in OpenSearchUpgradeSuite) which drives the router in Phase 1 dual-write and asserts documents reach OpenSearch (single + batch) — the isolated OS-leaf IT cannot reproduce either bug. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🤖 Bedrock Review —
|
…index path The OpenSearch site-search create path loaded its settings from os-sitesearch-settings.json but reused es-sitesearch-mapping.json for the mapping. The mapping is functionally OS-compatible (its analyzers exist in the OS settings), but reading an es-*.json resource from the OS lifecycle couples the two vendors: a future ES-only mapping change would silently alter OS. Adds os-sitesearch-mapping.json (identical content today) and points OSSiteSearchAPI.createSiteSearchIndex at it, mirroring the settings split so ES and OS own their resources independently. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
❌ Bedrock Review failed —
|
Decision — neutral
|
| ES runtime accessor | Neutral equivalent |
|---|---|
agg.getName() / agg.getType() |
getName() / getType() |
(Terms/Histogram).getBuckets() |
getBuckets() |
bucket.getKeyAsString() |
getKeyAsString() |
bucket.getKeyAsNumber() |
getKeyAsNumber() (lenient parse; date-histogram key normalized to epoch-millis) |
bucket.getDocCount() |
getDocCount() |
bucket.getAggregations() (nested) |
getAggregations() |
topHits.getHits()… (id / source / score) |
getHits() → SearchHits / SearchHit |
This is verified end-to-end (SiteSearchWebAPITest, ES Phase 0), on the OpenSearch path (OSSiteSearchAPIIntegrationTest, fromOS), and at the factory level deterministically (AggregationDomainTest, incl. the date-histogram ZonedDateTime→epoch-millis conversion).
Deliberate gaps (NOT reproduced — accepted)
These ES members are intentionally not exposed on the neutral types. They are analytics/lookup methods with no observed use in Site Search VTL, and reproducing them would re-introduce vendor coupling or carry no OpenSearch counterpart:
Aggregation.getMetadata()— the per-aggregationmetamap. Not surfaced.Terms.getBucketByKey(String)— random bucket lookup. Templates iterate buckets instead; a template doing$agg.getBucketByKey("x")would break.Terms.getSumOfOtherDocCounts()/Terms.Bucket.getDocCountError()/Terms.getDocCountError()— approximate-count error/overflow accounting. Not surfaced.bucket.getKey()type narrowedObject→String— for a date histogram, ESgetKey()returned aZonedDateTime; the neutralgetKey()returns the epoch-millis as a String. A template calling date methods on the raw key ($bucket.key.toInstant()) would break. The supported idioms aregetKeyAsString()/getKeyAsNumber(), which are preserved.
Risk: a customer/site VTL template relying on (2), (3), or the Object shape of (4) would fail after this change. We assess this as low likelihood — these are not part of the documented Site Search facet/aggregation idiom — and accept it knowingly rather than re-couple the neutral API to vendor types.
Interaction with the H-8 rollback label
The deliberate gaps are a forward-compatibility consideration (a template adopting the neutral shape on N), distinct from the rollback (H-8) concern already flagged on this PR (a template adopting the new accessors getBuckets()/getHits()/iterator() and then being rolled back to N-1, whose getAggregations() returns the ES type). No template was co-migrated in this PR, so the H-8 risk window stays theoretical until one is. If/when a template is migrated to the neutral accessors, it must be tracked against the N+1 retirement of the old ES return type per H-8's two-phase guidance.
If a gap proves needed
Add the specific accessor to the neutral type (e.g. a metadata() map, or a getBucketByKey helper on Aggregation) backed by both from(...) and fromOS(...) factories, with a unit test in AggregationDomainTest — rather than leaking the vendor type back onto the API.
Update —
|
…ap + aggregation tests (#35786) Address PR #36282 review feedback and aggregation test gaps: - OSSiteSearchAPI.putToIndex/deleteFromIndex now PROPAGATE DotSearchException instead of swallowing, so PhaseRouter can apply its per-phase policy (Phase 3 OS-primary -> surfaced; shadow phases -> logged WARN by the router) - setAlias returns true on success (was incorrectly false) - add requireValidIndexName guard (null/blank, non-lowercase, OS-forbidden chars) before interpolating the index name into the REST endpoint - read os-sitesearch mapping/settings via getResourceAsStream (JAR-safe; was new File(url.getPath()) which NPEs if missing and fails inside a JAR) - lower per-doc put/delete logs from info to debug (Supplier form) - close the getMetadata() equivalence gap on the neutral Aggregation record (rollback-safe: the ES Aggregation interface exposes it too), populated from both the ES and OpenSearch factories - tests: date_histogram (ZonedDateTime->epoch-millis) coverage in AggregationDomainTest + SiteSearchWebAPITest; strengthen the OS IT to assert bucket content/keys/counts + nested top_hits on the fromOS path; index-name validation IT cases Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review feedback addressed — pushed in
|
| Sev | Finding | Resolution |
|---|---|---|
| 🔴 | OSSiteSearchAPI.putToIndex swallows write failures (silent data loss) |
Now throws DotSearchException on non-2xx / caught exceptions. It must propagate, not log-and-swallow: PhaseRouter is already phase-aware — it re-throws the primary provider's failure and swallows the shadow's (WARN). Swallowing inside the impl defeated that, so a Phase-3 (OS-primary) write failure was lost. Now publishing observes it in Phase 3; OS shadow drift in phases 1/2 is logged WARN by the router. |
| 🔴 | deleteFromIndex swallows failures |
Same fix; HTTP 404 stays benign (idempotent delete). |
| 🟠 | setAlias returns false on success |
Returns true (createAlias is void + throws on failure → reaching the return = success). |
| 🟠 | Missing indexName validation → NPE / invalid chars reach OpenSearch |
New requireValidIndexName(idx) guard (null/blank, non-lowercase, OS-forbidden chars) fails fast with IllegalArgumentException before the name is interpolated into the REST endpoint. |
| 🟠 | Mapping/settings read via new File(url.getPath()) — NPEs if missing, fails inside a JAR |
Read via getResourceAsStream (readResource helper, UTF-8, clear error if absent). JAR-safe. |
| 🟡 | Per-doc Logger.info in put/delete (log spam + URL at info) |
Lowered to Logger.debug(..., Supplier) (no concatenation when off). Bulk delete log stays at info. |
| 🟡 | setAlias missing alias null/empty check |
Already present (the `UtilMethods.isNotSet(indexName) |
Deliberate scope note: ESSiteSearchAPI.putToIndex/deleteFromIndex also swallow failures (legacy, pre-existing). Left unchanged — in Phase 3 OS is the sole writer so the OS fix covers the critical case, and making the legacy ES path throw would surface previously-hidden failures to existing Phase-0/1 deployments (riskier, out of scope here).
Aggregation test coverage + equivalence
AggregationDomainTest— added deterministic ES-factory unit tests (Mockito, no container), incl. the previously-untesteddate_histogramZonedDateTime→epoch-millis key conversion. 11/11 pass.SiteSearchWebAPITest— addeddate_histogramcases (bothgetAggregationsand the legacygetFacetswrapper).OSSiteSearchAPIIntegrationTest— strengthened the terms-agg assertion (bucket content/keys/counts, was onlycontainsKey), added a nestedtop_hitscase on thefromOSpath, and index-name validation cases.getMetadata()equivalence gap closed — see the dedicated comment above; it's the one gap that's rollback-safe (the ESAggregationinterface exposes it too), so it completes interface equivalence without adding H-8 surface.getBucketByKey/getSumOfOtherDocCounts/getDocCountErrorremain deliberately deferred (no consumer; verified no shipped template uses$sitesearchaggregations).
Verification
./mvnw install -pl :dotcms-core and test-compile -pl :dotcms-integration → BUILD SUCCESS (Java 25). AggregationDomainTest 11/11. The OpenSearch ITs require the opensearch-upgrade container to run.
❌ Bedrock Review failed —
|
|
Claude finished @fabrizzio-dotCMS's task in 5m 32s —— View job Rollback Safety Analysis
Verdict: ✅ Safe To Rollback I analyzed the diff against every category in the rollback-unsafe reference. No unsafe category is triggered. Here is the category-by-category reasoning:
H-8 — VTL Viewtool contract — detailed assessment
M-4 — OSGi interface note
|
❌ Bedrock Review failed —
|
…ion path SearchHit.from(OpenSearch Hit) dropped the document source for nested top_hits: TopHitsAggregate hits are HitsMetadata<JsonData>, so Hit.source() is a JsonData, not a Map — it fell through to the empty-map fallback. Unwrap JsonData via to(Map.class) so the _source survives. Fixes the deterministic failure of OSSiteSearchAPIIntegrationTest.test_getAggregations_nestedTopHits_preservedOnOpenSearchPath on the OpenSearch Upgrade Suite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🤖 Bedrock Review —
|
Note for reviewers — the latest
|
| Reported (stale) | Actual current state |
|---|---|
putToIndex silently swallows write failures |
Fixed — now throw new DotSearchException(...) on non-2xx / caught exceptions (OSSiteSearchAPI.java ~595/606); PhaseRouter applies the per-phase primary/shadow policy. |
deleteFromIndex no index/idx validation, NPE risk |
Fixed — requireValidIndexName(idx) guard before the REST call (~553/647); 404 stays benign by design. |
putToIndex logs raw exception without context |
Fixed — the thrown message includes docId + index. |
putToIndex uses ESMappingAPIImpl.toJsonString on the OS path |
Intentional — faithful ES-format doc JSON is the documented design for the OS site-search doc body, not a regression. |
SiteSearchAPIImpl:227 — copyOf() shallow-copy "concurrent writes corrupt state" |
False positive — the fan-out is sequential, not concurrent; each provider gets an independent map copy; every putToIndex mutation is a top-level map.put(...) (entry replacement, not nested-object mutation), so a shallow copy is sufficient. This reasoning is already documented in the method's javadoc (SiteSearchAPIImpl.java ~240-245). |
A prior review pass (deepseek r1) had already marked these Resolved; the later qwen pass simply didn't re-diff. The substantive review feedback was addressed in ba8e9645cf (see the "Review feedback addressed" comment) and 86bf0fbb44, and the rollback-safety check now labels the PR Safe To Rollback.
🤖 Bedrock Review —
|
What & why
Closes the Site Search portion of the ES → OpenSearch migration (#35786). Site Search is decoupled from Elasticsearch types and given a working OpenSearch backend plus a phase-aware router, so it dual-writes and reads correctly across all migration phases.
Two commits:
org.elasticsearch.*from theSiteSearchAPIcontract andSiteSearchWebAPI, reusing the existingcom.dotcms.content.index.domain.Aggregation/AggregationBucketDTOs (from Aggregation return-type change breaks existing VTL templates accessing $results.aggregations #36026) with histogram support, and introducesDotSearchException.OSSiteSearchAPI, theSiteSearchAPIImplphase router, and an integration test.Changes
SiteSearchAPI/SiteSearchWebAPIgetAggregations/getFacetsreturn neutralAggregation;DotSearchExceptionaddedOSSiteSearchAPI(new)@ApplicationScoped @DefaultOpenSearch impl. Search/aggregations via the generic client →ContentSearchResponse(mirrorsOSSearchAPIImpl); doc put/delete via_docPUT/DELETE; get via typedclient.get(...). Default index resolved fromVersionedIndicesAPI(not the deprecatedIndiciesAPI)SiteSearchAPIImpl(new, router)PhaseRouter<SiteSearchAPI>mirroringIndexAPIImpl; the single fan-out point. Reads → read provider; doc/index writes → write fan-out;listIndices/listClosedIndicesmerge in dual-write; Quartz task methods route to a single provider (fan-out would double-schedule jobs)ESSiteSearchAPIESIndexAPIinstead of theIndexAPIrouter so the SiteSearch router is the only fan-out point (avoids double dual-write of OS indices)APILocatorSITESEARCH_APInow returnsSiteSearchAPIImplDesign notes
ESSiteSearchAPIin the enterprise package (license-gated feature). The singleannotatedbeans.xmlcovers the mergedtarget/classes, so CDI still discovers the@Defaultbean.VersionedIndicesAPIforce-tags.oson store/load, so the default isIndexTag.strip(...)-ed on read.deactivateIndexcallsremoveVersion(...)when removing the slot would leave the version empty (saveIndicesrejects empty).SearchHitDTO carries no highlights, so OSsearch()returns empty highlight arrays (the ES path is best-effort too) — markedTODO OS.Testing
./mvnw compile -pl :dotcms-core→ BUILD SUCCESS (Java 25)./mvnw test-compile -pl :dotcms-integration -am→ BUILD SUCCESSOSSiteSearchAPIIntegrationTest(registered inOpenSearchUpgradeSuite) covers lifecycle, doc round-trip, aggregations, and default-index activation. Requires theopensearch-upgradecontainer:🤖 Generated with Claude Code