Skip to content

fix(search): never set live refresh_interval to -1 on reindex promotion#29153

Open
harshach wants to merge 1 commit into
mainfrom
harshach/searchindex-refresh-interval-regression
Open

fix(search): never set live refresh_interval to -1 on reindex promotion#29153
harshach wants to merge 1 commit into
mainfrom
harshach/searchindex-refresh-interval-regression

Conversation

@harshach

@harshach harshach commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Describe your changes:

I added a guard in DefaultRecreateHandler.pickRefreshInterval so a refresh_interval of -1 (refresh disabled) is never applied as a live serving value during reindex promotion, because a misconfigured liveIndexSettings.refreshInterval=-1 was being re-applied verbatim to the promoted index and left newly indexed documents unsearchable until a manual _refresh (the "reindex finishes but the page is empty" symptom). When the resolved live value would be -1, it is now overridden to 1s and a warning is logged so operators can spot the bad config. Both promotion paths — centralized finalizeReindex and per-entity promoteEntityIndex — funnel through this method, so the guard covers both flows. Added a unit test asserting the revert JSON yields refresh_interval:"1s" and never -1.

Type of change:

  • Bug fix

High-level design:

N/A — small, self-contained change to the reindex live-settings revert.

Tests:

Use cases covered

  • Reindex promotion where the saved liveIndexSettings.refreshInterval is -1: the promoted index is left with a near-real-time 1s refresh instead of a disabled refresh, so create-then-search returns results without a manual _refresh.

Unit tests

  • Added DefaultRecreateHandlerTest.BuildRevertJsonTests#liveRefreshDisabledIsOverriddenToDefault — RED without the fix (the old code returned the live -1).
  • Verified: mvn -pl openmetadata-service test -Dtest=DefaultRecreateHandlerTest → 39 tests, 0 failures.

Backend integration tests

  • Not applicable (no API change; pure internal index-settings logic, covered by unit test).

Ingestion integration tests

  • Not applicable (no ingestion changes).

Playwright (UI) tests

  • Not applicable (no UI changes).

Manual testing performed

Verified via the unit test above; no local stack run required for this internal logic change.

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • For UI changes: N/A.
  • I have added tests and listed them above.
  • I have added a test that covers the exact scenario we are fixing.

Greptile Summary

This fix guards against a misconfigured liveIndexSettings.refreshInterval=-1 being re-applied verbatim to a promoted index during reindex, which left newly indexed documents invisible to search until a manual _refresh. A new isRefreshDisabled helper and a warn+override in pickRefreshInterval ensure the live value is always at least 1s.

  • Adds isRefreshDisabled(String) helper and overrides any -1 refresh interval to \"1s\" with a warning log, covering both the finalizeReindex and promoteEntityIndex promotion paths.
  • Adds a unit test asserting that buildRevertJson with live.refreshInterval=\"-1\" emits \"1s\", though the isolated case of live=\"-1\" with bulk=null is not separately exercised.

Confidence Score: 4/5

Safe to merge; the change is narrowly scoped to the reindex promotion path and the override only fires for the pathological -1 value.

The production logic is correct and the fix covers both promotion paths. The one gap is that the test exercises both live and bulk set to -1 simultaneously, so the simpler misconfiguration of live=-1 with no active bulk run is not directly pinned by a test — a future refactor could break that branch silently.

The test file would benefit from an additional case covering live="-1" with bulk=null.

Important Files Changed

Filename Overview
openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java Adds a guard in pickRefreshInterval that prevents a live refresh_interval of -1 from being applied on index promotion; overrides it to 1s and logs a warning. Logic refactored from two early-returns to a single isRefreshDisabled helper — semantics preserved for all existing paths.
openmetadata-service/src/test/java/org/openmetadata/service/search/DefaultRecreateHandlerTest.java Adds liveRefreshDisabledIsOverriddenToDefault test asserting the guard produces 1s and never -1. Test exercises the scenario where both live and bulk carry -1, but the pure live=-1, bulk=null case (the most direct form of the misconfiguration) is not explicitly covered.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[pickRefreshInterval called\nlive, bulk] --> B{live != null\n&& live.refreshInterval != null?}
    B -- Yes --> C[refreshInterval = live.refreshInterval]
    B -- No --> D{bulk != null\n&& bulk.refreshInterval != null?}
    D -- Yes --> E[refreshInterval = '1s'\ndefault live value]
    D -- No --> F[refreshInterval = null]
    C --> G{isRefreshDisabled?\nrefreshInterval.trim == '-1'}
    E --> G
    F --> G
    G -- Yes --> H[LOG.warn\nOverride to '1s']
    H --> I[refreshInterval = '1s']
    G -- No --> J[return refreshInterval as-is]
    I --> J
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[pickRefreshInterval called\nlive, bulk] --> B{live != null\n&& live.refreshInterval != null?}
    B -- Yes --> C[refreshInterval = live.refreshInterval]
    B -- No --> D{bulk != null\n&& bulk.refreshInterval != null?}
    D -- Yes --> E[refreshInterval = '1s'\ndefault live value]
    D -- No --> F[refreshInterval = null]
    C --> G{isRefreshDisabled?\nrefreshInterval.trim == '-1'}
    E --> G
    F --> G
    G -- Yes --> H[LOG.warn\nOverride to '1s']
    H --> I[refreshInterval = '1s']
    G -- No --> J[return refreshInterval as-is]
    I --> J
Loading

Reviews (1): Last reviewed commit: "fix(search): never apply refresh_interva..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

…on promote

A misconfigured liveIndexSettings.refreshInterval of "-1" (refresh disabled)
was re-applied verbatim to the promoted index by pickRefreshInterval, leaving
newly indexed documents unsearchable until a manual _refresh -- the "reindex
finishes but the page is empty" symptom. Guard the live-revert so a disabled
refresh is never used as the live value: override to 1s and log a warning.
Both promotion paths (finalizeReindex and promoteEntityIndex) funnel through
pickRefreshInterval, so the guard covers the centralized and per-entity flows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

  • No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 18, 2026
@gitar-bot

gitar-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown
Code Review ✅ Approved

Prevents the reindex promotion from applying a disabled refresh_interval (-1) to live indexes by overriding it to 1s. This ensures newly indexed documents remain searchable, and the fix is verified by a new unit test.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Comment on lines +1460 to +1476
@Test
@DisplayName("Live refresh_interval '-1' is never applied as a live value (guard -> 1s)")
void liveRefreshDisabledIsOverriddenToDefault() {
// Misconfiguration: liveIndexSettings.refreshInterval is "-1" (refresh disabled) — e.g.
// copied from the bulk side or a stale saved config. Without the guard the revert would
// faithfully re-apply "-1" to the promoted index, leaving it unsearchable until a manual
// _refresh (the "reindex finishes but the page is empty" symptom). The revert must override
// it back to the near-real-time default.
org.openmetadata.schema.system.IndexSettings live =
new org.openmetadata.schema.system.IndexSettings().withRefreshInterval("-1");
org.openmetadata.schema.system.BulkIndexOverrides bulk =
new org.openmetadata.schema.system.BulkIndexOverrides().withRefreshInterval("-1");
String json = DefaultRecreateHandler.buildRevertJson(live, bulk);
assertNotNull(json);
assertTrue(json.contains("\"refresh_interval\":\"1s\""));
assertFalse(json.contains("\"refresh_interval\":\"-1\""));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test doesn't isolate the primary bug scenario

The new test passes -1 for both live and bulk, but because the live branch wins (first if in pickRefreshInterval), bulk's value never influences the result. The scenario where the misconfiguration exists on live alone (live="-1", bulk=null) — i.e., no bulk-reindex was active — is the most direct form of the bug and goes untested. A separate parameterized case for bulk=null would pin that path and prevent a future refactor from inadvertently skipping the guard only when bulk is absent.

@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (17 flaky)

✅ 4294 passed · ❌ 0 failed · 🟡 17 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 301 0 1 4
🟡 Shard 2 811 0 1 9
🟡 Shard 3 812 0 1 8
🟡 Shard 4 852 0 5 12
🟡 Shard 5 730 0 3 47
🟡 Shard 6 788 0 6 8
🟡 17 flaky test(s) (passed on retry)
  • Pages/Roles.spec.ts › Roles page should work properly (shard 1, 1 retry)
  • Features/DataQuality/ColumnLevelTests.spec.ts › Column Values Length To Be Between (shard 2, 1 retry)
  • Features/Tasks/TaskNavigation.spec.ts › navigating to /table/TASK-XXXXX should show 404 (invalid URL pattern) (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Time (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Integer (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Hyperlink (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Directory (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Delete Chart (shard 5, 1 retry)
  • Pages/EntityDataSteward.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 2 retries)
  • Features/AutoPilot.spec.ts › Agents created by AutoPilot should be deleted (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
  • Pages/ServiceListing.spec.ts › should render the service listing page (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Admin user can get all the roles hierarchy and edit roles (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants