Version-aware provider resolution with source-recency provenance#618
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements version-aware provider resolution across one or more loaded provider databases, so event descriptions are resolved from the most complete provider “version” available, with deterministic recency-based tie-breaking using captured source provenance (OS build/revision and message-DLL file version). It also introduces a content-hash VersionKey to allow multiple versions of a provider name to coexist in a single database via a composite primary key.
Changes:
- Add
ProviderIdentity(SQLite NOCASE-compatible) andProviderVersionSelectorto support multi-version selection using completeness + provenance recency. - Extend the provider database model/schema to include
VersionKey, persisted value maps, and per-provider provenance; update upgrade/detection logic to usePRAGMA user_versionstamping. - Update database tooling and tests to operate on provider identities (name + version key) rather than name-only, and add hashing canonicalization + key computation tests.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/Unit/EventLogExpert.Provider.Tests/Resolution/ProviderVersionSelectorTests.cs | Adds unit coverage for completeness/recency selection behavior. |
| tests/Unit/EventLogExpert.Provider.Tests/Resolution/ProviderIdentityTests.cs | Verifies identity equality semantics match SQLite NOCASE behavior. |
| tests/Unit/EventLogExpert.Provider.Database.Tests/Serialization/ProviderJsonContextTests.cs | Extends JSON context coverage for value-map serialization types. |
| tests/Unit/EventLogExpert.Provider.Database.Tests/Maintenance/ProviderDetailsMergerTests.cs | Adds tests for version-key-aware merging and map merge/conflict behavior. |
| tests/Unit/EventLogExpert.Provider.Database.Tests/Hashing/VersionKeyCalculatorTests.cs | Adds unit tests for deterministic canonicalization + hashing invariants. |
| tests/Integration/EventLogExpert.Provider.Database.IntegrationTests/Context/ProviderDbContextTests.cs | Adds integration coverage for composite PK, stamping, provenance persistence, and rebuild behavior. |
| tests/Integration/EventLogExpert.Eventing.IntegrationTests/Resolvers/EventProviderDatabaseEventResolverTests.cs | Updates resolver integration tests for “most complete wins” across multiple DBs. |
| tests/Integration/EventLogExpert.Eventing.IntegrationTests/PublisherMetadata/HostOsProvenanceTests.cs | Adds integration tests for host OS provenance registry reads. |
| tests/Integration/EventLogExpert.DatabaseTools.IntegrationTests/Sources/ProviderSourceTests.cs | Adds integration coverage for non-ASCII NOCASE behavior and cross-file version coexistence. |
| tests/Integration/EventLogExpert.DatabaseTools.IntegrationTests/Operations/MergeDatabaseOperationTests.cs | Verifies identity-aware merge behavior for overwrite/non-overwrite scenarios. |
| tests/Integration/EventLogExpert.DatabaseTools.IntegrationTests/Operations/DiffDatabaseOperationTests.cs | Verifies identity-aware diff copies “new versions” of an existing provider name. |
| tests/Integration/EventLogExpert.DatabaseTools.IntegrationTests/Operations/CreateDatabaseOperationTests.cs | Verifies create stamps content-hash keys and collapses post-stamp collisions. |
| src/EventLogExpert.Runtime/Database/DatabaseFileOperations.cs | Adjusts readiness checks to use NeedsUpgrade vs version equality. |
| src/EventLogExpert.Runtime/Database/DatabaseClassificationService.cs | Maps schema state (incl. stamped/unknown) to UI/logic status. |
| src/EventLogExpert.Provider/Schema/DatabaseSchemaVersion.cs | Keeps schema constants; formatting tweak. |
| src/EventLogExpert.Provider/Schema/DatabaseSchemaState.cs | Allows explicit NeedsUpgrade override (e.g., future user_version). |
| src/EventLogExpert.Provider/Resolution/ProviderVersionSelector.cs | Implements selection of most complete provider version with provenance recency tie-break. |
| src/EventLogExpert.Provider/Resolution/ProviderIdentity.cs | Adds composite provider identity matching SQLite NOCASE semantics. |
| src/EventLogExpert.Provider/Resolution/ProviderDetails.cs | Adds VersionKey and provenance fields to provider details. |
| src/EventLogExpert.Provider/Resolution/ProviderContentMerge.cs | Centralizes merge equivalence/identity rules shared by maintenance and future union logic. |
| src/EventLogExpert.Provider/Resolution/DatabasePathSorter.cs | Removes large doc block; functional sort remains. |
| src/EventLogExpert.Provider/Lookup/IProviderDetailsLookup.cs | Adds FindAllProviderVersions for multi-version resolution. |
| src/EventLogExpert.Provider.Database/Serialization/ProviderJsonContext.cs | Adds STJ source-gen types for value-map persistence. |
| src/EventLogExpert.Provider.Database/Maintenance/ProviderDetailsMerger.cs | Groups/merges duplicates by (name, version) identity; merges maps + provenance fields. |
| src/EventLogExpert.Provider.Database/Hashing/VersionKeyCalculator.cs | Adds content-hash version key computation + base32 encoding. |
| src/EventLogExpert.Provider.Database/Hashing/ProviderContentCanonicalizer.cs | Adds canonical binary encoding for stable content hashing. |
| src/EventLogExpert.Provider.Database/Context/ProviderDbContext.cs | Introduces composite PK, stamps user_version, persists maps, reads/writes provenance, and adds multi-version query API. |
| src/EventLogExpert.Eventing/Resolvers/EventResolver.cs | Resolves providers by collecting all DB versions and selecting the most complete+newest. |
| src/EventLogExpert.Eventing/PublisherMetadata/MtaProviderSource.cs | Updates skip/dedup logic to use provider identities. |
| src/EventLogExpert.Eventing/PublisherMetadata/HostOsProvenance.cs | Reads host OS provenance from registry for capture-time stamping. |
| src/EventLogExpert.Eventing/PublisherMetadata/EventMessageProvider.cs | Records newest message DLL file version as provenance for recency tie-breaks. |
| src/EventLogExpert.DatabaseTools/MergeDatabase/MergeDatabaseOperation.cs | Makes merge identity-aware for delete/skip behavior under composite key. |
| src/EventLogExpert.DatabaseTools/DiffDatabase/DiffDatabaseOperation.cs | Makes diff identity-aware (copies missing versions, not name-only). |
| src/EventLogExpert.DatabaseTools/CreateDatabase/CreateDatabaseOperation.cs | Stamps content-hash VersionKey, collapses post-stamp identity collisions, and stamps host OS provenance for local captures. |
| src/EventLogExpert.DatabaseTools/Common/Operations/ProviderSource.cs | Adds identity discovery + identity-aware provider loading across db/evtx sources. |
| src/EventLogExpert.DatabaseTools/Common/Operations/OperationBase.cs | Renames/clarifies name-level exclude semantics for local provider loading. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Addressed the three review comments in 2e2e5f3:
Added tests: |
|
Addressed the latest review round (commits 0ac0f39, 2367592) plus a finding from a full pre-publish review panel:
All suites green; build clean (Debug + Release). |
|
Addressed the latest two review comments:
All suites green; build clean (Debug + Release). |
Summary
Makes provider-database resolution version-aware so an older OS's provider can never resolve a newer OS's
events, removes the dependence on database file names for picking a version, and keeps the per-event
resolution hot path allocation-free. Builds the foundation incrementally: distinct provider versions can
coexist in one database, resolution selects the most complete version across all loaded databases, and each
provider row now records its source provenance so resolution prefers the newest source as a deterministic
tiebreak.
Motivation
Provider databases are built per OS (for example Windows Server 2012 R2, 2016, 2019, 2022). Previously a
merged or multi-database setup could resolve an event's description from an arbitrary version (the first row
returned), and cross-database selection leaned on the database file name as a fragile recency proxy. Research
on a real 215k-event corpus confirmed that an EVTX record carries no OS build, file version, or manifest
wording version, so the source version cannot be matched per event; recency is a global tiebreak only, and
"newest source wins" is the correct default. This branch realizes that without any file-name dependence.
What changed (commit arc)
with cross-file dedup keyed on the same identity.
to one row and different versions get distinct composite keys.
selection cannot drift.
one, unchanged (one coherent version's events, templates, maps, parameters, and messages), fixing a live
arbitrary-version bug where a merged multi-OS database resolved a random version.
newest message-DLL file version), captured at database creation. Resolution consumes it as a recency
tiebreak below completeness: a newer-but-empty capture never beats an older populated one.
Schema and migration
The persisted schema stays at version 4; the provenance columns are added within v4 because no build that
stamps the canonical user_version has shipped yet (the stamp lives only on this branch). Every existing
database is therefore unstamped and rebuilds through the existing upgrade path, gaining the new columns with
null provenance (re-captured on a live recreate). The rebuild reader and the merge path carry provenance
forward so a future rebuild cannot silently drop it. A developer-only (DEBUG) self-heal rebuilds a
stamped-current database whose physical columns no longer match the model; in Release the detection is
byte-identical to before (no behavior change for shipped builds).
Resolution behavior and performance
Selection runs once per provider under the existing single-flight gate and reads only in-memory scalar
fields, so there is no per-event cost and no lazy materialization on the hot path. The selection ladder is:
row-count gate, then completeness (non-empty description count, total description length, message count),
then recency (OS build, update revision, message-DLL file version) with component-wise present-outranks-null
ordering, then load order. The winner is always an existing candidate returned by reference, so no per-event
content blending occurs.
Testing
All suites pass: Provider unit, Provider.Database unit and integration, Eventing unit and integration,
DatabaseTools integration, and Runtime unit. New tests cover version coexistence and the composite key, the
content-hash VersionKey, most-complete selection, the recency ladder (including null and unparseable
degradation), provenance round-trip and rebuild preservation, the v3-to-v4 upgrade adding null-provenance
columns, the developer-only in-place self-heal, host OS provenance reads, and that provenance does not
perturb the VersionKey.
Notes and follow-ups (not in this PR)
the coherent whole-version selection model.
phase.