Skip to content

Version-aware provider resolution with source-recency provenance#618

Merged
NikTilton merged 12 commits into
mainfrom
jschick/db-valuemap
Jun 25, 2026
Merged

Version-aware provider resolution with source-recency provenance#618
NikTilton merged 12 commits into
mainfrom
jschick/db-valuemap

Conversation

@jschick04

Copy link
Copy Markdown
Collaborator

Summary

Makes provider-database resolution version-aware so an older OS's provider can never resolve a newer OS's
events, removes the dependence on database file names for picking a version, and keeps the per-event
resolution hot path allocation-free. Builds the foundation incrementally: distinct provider versions can
coexist in one database, resolution selects the most complete version across all loaded databases, and each
provider row now records its source provenance so resolution prefers the newest source as a deterministic
tiebreak.

Motivation

Provider databases are built per OS (for example Windows Server 2012 R2, 2016, 2019, 2022). Previously a
merged or multi-database setup could resolve an event's description from an arbitrary version (the first row
returned), and cross-database selection leaned on the database file name as a fragile recency proxy. Research
on a real 215k-event corpus confirmed that an EVTX record carries no OS build, file version, or manifest
wording version, so the source version cannot be matched per event; recency is a global tiebreak only, and
"newest source wins" is the correct default. This branch realizes that without any file-name dependence.

What changed (commit arc)

  • Canonical multi-version schema and an upgrade that force-rebuilds older databases to it.
  • Providers merge by (name, content version) so genuinely different versions coexist instead of colliding,
    with cross-file dedup keyed on the same identity.
  • A content-hash VersionKey is stamped at database creation so identical providers across machines collapse
    to one row and different versions get distinct composite keys.
  • The shared provider-content merge primitive is extracted so the database-merge path and the resolve-time
    selection cannot drift.
  • Resolution gathers every provider version across all loaded databases and returns the single most complete
    one, unchanged (one coherent version's events, templates, maps, parameters, and messages), fixing a live
    arbitrary-version bug where a merged multi-OS database resolved a random version.
  • Each provider row records source provenance (OS build, update revision, edition, display version, and the
    newest message-DLL file version), captured at database creation. Resolution consumes it as a recency
    tiebreak below completeness: a newer-but-empty capture never beats an older populated one.

Schema and migration

The persisted schema stays at version 4; the provenance columns are added within v4 because no build that
stamps the canonical user_version has shipped yet (the stamp lives only on this branch). Every existing
database is therefore unstamped and rebuilds through the existing upgrade path, gaining the new columns with
null provenance (re-captured on a live recreate). The rebuild reader and the merge path carry provenance
forward so a future rebuild cannot silently drop it. A developer-only (DEBUG) self-heal rebuilds a
stamped-current database whose physical columns no longer match the model; in Release the detection is
byte-identical to before (no behavior change for shipped builds).

Resolution behavior and performance

Selection runs once per provider under the existing single-flight gate and reads only in-memory scalar
fields, so there is no per-event cost and no lazy materialization on the hot path. The selection ladder is:
row-count gate, then completeness (non-empty description count, total description length, message count),
then recency (OS build, update revision, message-DLL file version) with component-wise present-outranks-null
ordering, then load order. The winner is always an existing candidate returned by reference, so no per-event
content blending occurs.

Testing

All suites pass: Provider unit, Provider.Database unit and integration, Eventing unit and integration,
DatabaseTools integration, and Runtime unit. New tests cover version coexistence and the composite key, the
content-hash VersionKey, most-complete selection, the recency ladder (including null and unparseable
degradation), provenance round-trip and rebuild preservation, the v3-to-v4 upgrade adding null-provenance
columns, the developer-only in-place self-heal, host OS provenance reads, and that provenance does not
perturb the VersionKey.

Notes and follow-ups (not in this PR)

  • Offline merge-tiebreak (per-event best-of plus a variants store) and a pin UI are deferred; this PR keeps
    the coherent whole-version selection model.
  • Offline image (ISO/WIM) extraction to build provenanced multi-OS databases from images is a separate later
    phase.

Copilot AI review requested due to automatic review settings June 25, 2026 00:20
@jschick04 jschick04 requested a review from a team as a code owner June 25, 2026 00:20
@jschick04 jschick04 marked this pull request as draft June 25, 2026 00:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements version-aware provider resolution across one or more loaded provider databases, so event descriptions are resolved from the most complete provider “version” available, with deterministic recency-based tie-breaking using captured source provenance (OS build/revision and message-DLL file version). It also introduces a content-hash VersionKey to allow multiple versions of a provider name to coexist in a single database via a composite primary key.

Changes:

  • Add ProviderIdentity (SQLite NOCASE-compatible) and ProviderVersionSelector to support multi-version selection using completeness + provenance recency.
  • Extend the provider database model/schema to include VersionKey, persisted value maps, and per-provider provenance; update upgrade/detection logic to use PRAGMA user_version stamping.
  • Update database tooling and tests to operate on provider identities (name + version key) rather than name-only, and add hashing canonicalization + key computation tests.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/Unit/EventLogExpert.Provider.Tests/Resolution/ProviderVersionSelectorTests.cs Adds unit coverage for completeness/recency selection behavior.
tests/Unit/EventLogExpert.Provider.Tests/Resolution/ProviderIdentityTests.cs Verifies identity equality semantics match SQLite NOCASE behavior.
tests/Unit/EventLogExpert.Provider.Database.Tests/Serialization/ProviderJsonContextTests.cs Extends JSON context coverage for value-map serialization types.
tests/Unit/EventLogExpert.Provider.Database.Tests/Maintenance/ProviderDetailsMergerTests.cs Adds tests for version-key-aware merging and map merge/conflict behavior.
tests/Unit/EventLogExpert.Provider.Database.Tests/Hashing/VersionKeyCalculatorTests.cs Adds unit tests for deterministic canonicalization + hashing invariants.
tests/Integration/EventLogExpert.Provider.Database.IntegrationTests/Context/ProviderDbContextTests.cs Adds integration coverage for composite PK, stamping, provenance persistence, and rebuild behavior.
tests/Integration/EventLogExpert.Eventing.IntegrationTests/Resolvers/EventProviderDatabaseEventResolverTests.cs Updates resolver integration tests for “most complete wins” across multiple DBs.
tests/Integration/EventLogExpert.Eventing.IntegrationTests/PublisherMetadata/HostOsProvenanceTests.cs Adds integration tests for host OS provenance registry reads.
tests/Integration/EventLogExpert.DatabaseTools.IntegrationTests/Sources/ProviderSourceTests.cs Adds integration coverage for non-ASCII NOCASE behavior and cross-file version coexistence.
tests/Integration/EventLogExpert.DatabaseTools.IntegrationTests/Operations/MergeDatabaseOperationTests.cs Verifies identity-aware merge behavior for overwrite/non-overwrite scenarios.
tests/Integration/EventLogExpert.DatabaseTools.IntegrationTests/Operations/DiffDatabaseOperationTests.cs Verifies identity-aware diff copies “new versions” of an existing provider name.
tests/Integration/EventLogExpert.DatabaseTools.IntegrationTests/Operations/CreateDatabaseOperationTests.cs Verifies create stamps content-hash keys and collapses post-stamp collisions.
src/EventLogExpert.Runtime/Database/DatabaseFileOperations.cs Adjusts readiness checks to use NeedsUpgrade vs version equality.
src/EventLogExpert.Runtime/Database/DatabaseClassificationService.cs Maps schema state (incl. stamped/unknown) to UI/logic status.
src/EventLogExpert.Provider/Schema/DatabaseSchemaVersion.cs Keeps schema constants; formatting tweak.
src/EventLogExpert.Provider/Schema/DatabaseSchemaState.cs Allows explicit NeedsUpgrade override (e.g., future user_version).
src/EventLogExpert.Provider/Resolution/ProviderVersionSelector.cs Implements selection of most complete provider version with provenance recency tie-break.
src/EventLogExpert.Provider/Resolution/ProviderIdentity.cs Adds composite provider identity matching SQLite NOCASE semantics.
src/EventLogExpert.Provider/Resolution/ProviderDetails.cs Adds VersionKey and provenance fields to provider details.
src/EventLogExpert.Provider/Resolution/ProviderContentMerge.cs Centralizes merge equivalence/identity rules shared by maintenance and future union logic.
src/EventLogExpert.Provider/Resolution/DatabasePathSorter.cs Removes large doc block; functional sort remains.
src/EventLogExpert.Provider/Lookup/IProviderDetailsLookup.cs Adds FindAllProviderVersions for multi-version resolution.
src/EventLogExpert.Provider.Database/Serialization/ProviderJsonContext.cs Adds STJ source-gen types for value-map persistence.
src/EventLogExpert.Provider.Database/Maintenance/ProviderDetailsMerger.cs Groups/merges duplicates by (name, version) identity; merges maps + provenance fields.
src/EventLogExpert.Provider.Database/Hashing/VersionKeyCalculator.cs Adds content-hash version key computation + base32 encoding.
src/EventLogExpert.Provider.Database/Hashing/ProviderContentCanonicalizer.cs Adds canonical binary encoding for stable content hashing.
src/EventLogExpert.Provider.Database/Context/ProviderDbContext.cs Introduces composite PK, stamps user_version, persists maps, reads/writes provenance, and adds multi-version query API.
src/EventLogExpert.Eventing/Resolvers/EventResolver.cs Resolves providers by collecting all DB versions and selecting the most complete+newest.
src/EventLogExpert.Eventing/PublisherMetadata/MtaProviderSource.cs Updates skip/dedup logic to use provider identities.
src/EventLogExpert.Eventing/PublisherMetadata/HostOsProvenance.cs Reads host OS provenance from registry for capture-time stamping.
src/EventLogExpert.Eventing/PublisherMetadata/EventMessageProvider.cs Records newest message DLL file version as provenance for recency tie-breaks.
src/EventLogExpert.DatabaseTools/MergeDatabase/MergeDatabaseOperation.cs Makes merge identity-aware for delete/skip behavior under composite key.
src/EventLogExpert.DatabaseTools/DiffDatabase/DiffDatabaseOperation.cs Makes diff identity-aware (copies missing versions, not name-only).
src/EventLogExpert.DatabaseTools/CreateDatabase/CreateDatabaseOperation.cs Stamps content-hash VersionKey, collapses post-stamp identity collisions, and stamps host OS provenance for local captures.
src/EventLogExpert.DatabaseTools/Common/Operations/ProviderSource.cs Adds identity discovery + identity-aware provider loading across db/evtx sources.
src/EventLogExpert.DatabaseTools/Common/Operations/OperationBase.cs Renames/clarifies name-level exclude semantics for local provider loading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/EventLogExpert.Provider.Database/Hashing/VersionKeyCalculator.cs Outdated
Comment thread src/EventLogExpert.Provider.Database/Maintenance/ProviderDetailsMerger.cs Outdated
@jschick04 jschick04 requested a review from Copilot June 25, 2026 01:08
@jschick04

Copy link
Copy Markdown
Collaborator Author

Addressed the three review comments in 2e2e5f3:

  1. Stale doc reference (VersionKeyCalculator): Updated the XML doc to match current behavior. VersionKeyCalculator.Compute is invoked only in CreateDatabaseOperation; the merge and diff operations copy already-stamped rows, so the doc no longer claims they stamp the key.

  2. Dead-symbol drift comment (VersionKeyCalculatorTests): The drift-guard comment now points to ProviderContentMerge.EventsAreEquivalent (the current location) instead of the removed ProviderDetailsMerger.EventsAreEquivalent.

  3. Order-dependent provenance merge (ProviderDetailsMerger): Replaced the per-field FirstOrDefault carry with a deterministic pick. The recency comparison was extracted into a shared ProviderSourceRecency used by both ProviderVersionSelector (resolve-time tiebreak) and the merger, so the two cannot drift on which source is newer. The merged row now takes all five provenance fields coherently from the newest source row, and a total-order tiebreak (recency, then ordinal edition / display version / file-version string with null lowest) keeps the choice deterministic even when recency ties.

Added tests: ProviderSourceRecencyTests (ordering, present-outranks-absent, unparseable degrade) and merger tests asserting the newest source provenance is carried as a coherent unit and that a full-recency tie resolves deterministically regardless of row order. All suites pass.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated 2 comments.

Comment thread src/EventLogExpert.Provider.Database/Context/ProviderDbContext.cs Outdated
Comment thread src/EventLogExpert.DatabaseTools/MergeDatabase/MergeDatabaseOperation.cs Outdated
@jschick04 jschick04 requested a review from Copilot June 25, 2026 02:23
@jschick04

Copy link
Copy Markdown
Collaborator Author

Addressed the latest review round (commits 0ac0f39, 2367592) plus a finding from a full pre-publish review panel:

  1. Schema-invariant comment (ProviderDbContext.OnModelCreating): Rewrote it to match the PR's stay-on-v4 approach. The must-bump-Current rule applies once a build that stamps user_version has shipped; during the prerelease window (no stamped build yet) the v4 shape is finalized in place because every real database is unstamped (user_version=0) and rebuilds through the upgrade path, with a dev-only (DEBUG) self-heal for stamped developer databases.

  2. Merge log wording (MergeDatabaseOperation): "Copying providers" -> "Copying provider versions" and "Copied N provider(s)" -> "provider version(s)", consistent with the version-aware vocabulary used elsewhere in the operation.

  3. Merge now fails on an un-upgraded source (new): A pre-publish review found that MergeDatabaseOperation validated the target schema but not the source, so an obsolete/unstamped source DB (now classified as needing an upgrade) made merge report success while copying nothing. Merge now validates the source schema up front (mirroring DiffDatabaseOperation) and returns Failed with the existing "run the upgrade command on the source first" error. Added a regression test (MergeDatabase_WithSourceNeedingUpgrade_FailsInsteadOfReportingSuccess).

All suites green; build clean (Debug + Release).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated 2 comments.

@jschick04

Copy link
Copy Markdown
Collaborator Author

Addressed the latest two review comments:

  1. Nondeterministic identity ordering (ProviderSource.LoadProviderIdentitiesAsync) — fixed (bc6c0b6). The result sort used OrderBy(ProviderName, OrdinalIgnoreCase).ThenBy(VersionKey, Ordinal). OrdinalIgnoreCase folds full-Unicode case, but ProviderIdentity (matching SQLite NOCASE) folds ASCII only, so two names differing solely by non-ASCII case are distinct identities that tied on the primary key and, since the source is a HashSet, could return in nondeterministic order. Added a middle .ThenBy(ProviderName, Ordinal) tiebreak so the sort is a total deterministic order. Added a regression test (LoadProviderIdentities_NonAsciiCaseVariantNames_ReturnsDeterministicOrder).

  2. Per-row stub deletes in the merge overwrite path — keeping as-is (by design). The overwrite deletes only the colliding (ProviderName, VersionKey) identities by composite primary key so that other versions of the same provider name in the target survive. A set-based delete isn't available here because SQLite cannot translate a composite-key IN (the same reason the surrounding queries chunk by name), and a name-based ExecuteDelete would over-delete non-colliding versions — a correctness regression. The deletes are EF-batched, symmetric with the re-inserts inside a single transaction, and this is an offline admin operation, not a runtime hot path.

All suites green; build clean (Debug + Release).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated no new comments.

@jschick04 jschick04 marked this pull request as ready for review June 25, 2026 03:02
@NikTilton NikTilton merged commit 9142ddf into main Jun 25, 2026
8 checks passed
@NikTilton NikTilton deleted the jschick/db-valuemap branch June 25, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants