Rebase 32x-default-compression on main#3371
Conversation
Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
…fields (#3324) Signed-off-by: Navneet Verma <navneev@amazon.com>
Signed-off-by: shreyah963 <shreyab963@gmail.com>
Signed-off-by: Kunal Kotwani <kkotwani@amazon.com> Co-authored-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Sayali Gaikawad <gaiksaya@amazon.com>
Signed-off-by: opensearch-ci-bot <opensearch-infra@amazon.com>
* Fix derived source for mixed-case vector fields Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Add BWC coverage for derived source field casing Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Add changelog entry for mixed-case derived source fix Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Handle case-insensitive conflicts by preferring vector field Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Avoid stream wrappers for derived field lookup Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Handle ambiguous case-insensitive matches without vector hints Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Update src/main/java/org/opensearch/knn/index/codec/KNN10010Codec/KNN10010DerivedSourceStoredFieldsFormat.java Co-authored-by: Tejas Shah <shatejas@amazon.com> Signed-off-by: Wonjae Lee <38933452+leewjae@users.noreply.github.com> * Apply spotless formatting for derived source field resolution Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Avoid guessing when case-insensitive matches lack vector hints Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Simplify case-insensitive derived field matching Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Trigger CI rerun for BWC investigation Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> * Add native engine field info coverage Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> --------- Signed-off-by: Wonjae Lee <wonjae.lee@dremio.com> Signed-off-by: Wonjae Lee <38933452+leewjae@users.noreply.github.com> Signed-off-by: Tejas Shah <shatejas@amazon.com> Co-authored-by: Tejas Shah <shatejas@amazon.com> Co-authored-by: Navneet Verma <navneev@amazon.com>
* Fixes RescoreParser to pass the rescore flag For multinode or coordinator-data node setup, rescore set to false is not passed through streams. This causes rescoring to execute even when its not disabled explicitly by user Signed-off-by: Tejas Shah <shatejas@amazon.com> * Updates Changelogs, improves code cov Signed-off-by: Tejas Shah <shatejas@amazon.com> * Makes the coordinator port dynamic Signed-off-by: Tejas Shah <shatejas@amazon.com> * Adds BWC test for mode and compression Signed-off-by: Tejas Shah <shatejas@amazon.com> * Does not create compressed indices before 2.18 Signed-off-by: Tejas Shah <shatejas@amazon.com> * Fixes bwc Signed-off-by: Tejas Shah <shatejas@amazon.com> --------- Signed-off-by: Tejas Shah <shatejas@amazon.com>
* Rescoring after radial search on quantized index. [Task 1 - 4] (#3300) * Bumped gradle to 9.4.1 and jacoco to 0.8.14 (#3308) Signed-off-by: Andrew Klepchick <aklepchi@amazon.com> * Use KNN1040ScalarQuantizedVectorsFormat for Faiss SQ flat format (#3302) The Faiss SQ format was using Lucene's Lucene104ScalarQuantizedVectorsFormat directly, which lacks the prefetch-enabled raw vector reader that KNN1040ScalarQuantizedVectorsFormat provides. This meant exact search rescoring was missing I/O prefetch during graph traversal. Changes: - Switch faissSqFlatFormat from Lucene104ScalarQuantizedVectorsFormat to KNN1040ScalarQuantizedVectorsFormat in Faiss1040ScalarQuantizedKnnVectorsFormat - Add @VisibleForTesting getFlatVectorsReader() to Faiss1040ScalarQuantizedKnnVectorsReader to replace reflection in tests - Add testGetRandomVectorScorer_returnsPrefetchableScorer in KNN1040ScalarQuantizedVectorsFormatTests verifying the scorer is PrefetchableRandomVectorScorer via a real write/read cycle - Replace reflection with getter in Faiss1040ScalarQuantizedKnnVectorsFormatTests.testFieldsReader_thenWrapsFlatReaderWithPrefetchSupport Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Allow minScore, maxDistance for 32x SQ index. Signed-off-by: Dooyong Kim <kdooyong@amazon.com> Pass compression and quantization config to RNN query builder. Signed-off-by: Dooyong Kim <kdooyong@amazon.com> Added RescoreRadialSearchQuery. Signed-off-by: Dooyong Kim <kdooyong@amazon.com> Wiring `RescoreRadialSearchQuery` wrapper in `RNNQueryFactory` Signed-off-by: Dooyong Kim <kdooyong@amazon.com> --------- Signed-off-by: Andrew Klepchick <aklepchi@amazon.com> Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Dooyong Kim <kdooyong@amazon.com> Co-authored-by: Andrew Klepchick <aklepchi@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com> * Rescore radial search quantized complete (#3337) * Added exact search logic after radial. Signed-off-by: Dooyong Kim <kdooyong@amazon.com> * Adding 2nd rescoring after radial search on quantized index. Signed-off-by: Dooyong Kim <kdooyong@amazon.com> --------- Signed-off-by: Dooyong Kim <kdooyong@amazon.com> * Update changelog Signed-off-by: Dooyong Kim <kdooyong@amazon.com> --------- Signed-off-by: Andrew Klepchick <aklepchi@amazon.com> Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Signed-off-by: Dooyong Kim <kdooyong@amazon.com> Co-authored-by: Andrew Klepchick <aklepchi@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com>
Signed-off-by: Navneet Verma <navneev@amazon.com>
Signed-off-by: Divya Madala <divyaasm@amazon.com> Co-authored-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Andrew Klepchick <aklepchi@amazon.com>
Vectors can now be indexed as base64-encoded strings in addition to JSON arrays. Float vectors use little-endian byte encoding (symmetric with the doc_values binary output format), while byte/binary vectors use raw byte encoding. This enables efficient bulk ingestion pipelines that avoid JSON array serialization overhead. Signed-off-by: Navneet Verma <navneev@amazon.com>
…NotSupportedException. (#3344) Signed-off-by: Dooyong Kim <kdooyong@amazon.com> Signed-off-by: Doo Yong Kim <kdooyong@amazon.com>
…ersample_factor (#3331)
…3363) Signed-off-by: Divya Madala <divyaasm@amazon.com>
…3367) Signed-off-by: Navneet Verma <navneev@amazon.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 80c40d9. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## feature/32x-default-compression #3371 +/- ##
=====================================================================
+ Coverage 83.53% 83.73% +0.19%
- Complexity 4287 4358 +71
=====================================================================
Files 450 453 +3
Lines 15552 15773 +221
Branches 2015 2056 +41
=====================================================================
+ Hits 12991 13207 +216
- Misses 1770 1782 +12
+ Partials 791 784 -7 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
d5f120f
into
feature/32x-default-compression
Description
Related Issues
Check List
API changes companion pull request created.--signoff.Public documentation issue/PR created.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.