Feat/implementation fingerprinting by 0xlaga · Pull Request #12 · jharveyb/gossip_observer

0xlaga · 2026-04-01T19:29:36Z

`impl_fingerprint` — Lightning node implementation fingerprinting

Closes #10.

What's in this PR

Area	Description
New crate	`impl_fingerprint/` — standalone binary with `scrape`, `classify`, and `validate` subcommands
Upstream changes	Minimal additions to `observer_common` (proto + types) and `gossip_analyze` (dump plumbing) to expose `node_features` hex
Scrapers	Version records for 4 implementations (14 versions), hardcoded in `src/scraper/` from upstream source code — no network access at runtime
Classifier	Three-layer cascade that short-circuits on first confident match
Validation	Training-set scorer with per-implementation precision/recall and confusion matrix
Test suite	142 integration tests — all passing, zero clippy warnings
Demo script	`demo.sh` exercises the full pipeline end-to-end with color-coded terminal output

Implementation coverage

Implementation	Versions	Source files studied
LND	v0.15.5-beta → v0.18.4-beta	`feature/default_sets.go`, `chainreg/chainregistry.go`
CLN	v23.11.2 → v24.11.1	`common/features.c`, `lightningd/options.c`
LDK	v0.0.118 → v0.2.2	`lightning/src/ln/channelmanager.rs`, `lightning/src/util/config.rs`
Eclair	v0.9.0 → v0.10.0	`reference.conf`, `Features.scala`

Classifier layers

Exact hex — node's node_features hex matches a DB record exactly → high confidence, specific version range.
Heuristic bits — individual feature bits tested against each version's mandatory/optional/not-set requirements. NotSet constraints are the key differentiators (e.g. LDK never sets gossip-queries/amp, Eclair never sets amp/keysend).
Policy scoring — channel policy defaults scored against each implementation's known defaults (LND: cltv=80/fee_ppm=1, CLN: cltv=34/fee_ppm=10, LDK: cltv=72/fee_ppm=0, Eclair: cltv=144/fee_ppm=200).

Crate structure

impl_fingerprint/
├── src/
│   ├── main.rs          # CLI (clap)
│   ├── lib.rs           # public API re-exports
│   ├── scraper/         # {lnd,cln,ldk,eclair}.rs — hardcoded version records
│   ├── db.rs            # fingerprint database (JSON serde)
│   ├── input.rs         # load nodes/channels from gossip_analyze dumps
│   ├── classifier.rs    # 3-layer cascade
│   └── validate.rs      # accuracy scoring + confusion matrix
├── tests/               # 142 integration tests + synthetic fixtures
└── demo.sh              # end-to-end pipeline demo

…ests - Add node_features (Vec<u8> LE bytes) to observer_common::types::NodeAnnouncementInfo - Add node_features (bytes) field to common.proto NodeAnnouncementInfo (field 4) - Regenerate common.rs from updated proto - Update all From impls in types.rs to populate node_features via le_flags() - Add node_features (hex String, BE) to gossip_analyze::NodeAnnInfo - Add observer_common/tests/proto_roundtrip.rs (8 integration tests) - Add gossip_analyze/tests/serde_roundtrip.rs (7 integration tests)

New crate impl_fingerprint with: - src/db.rs: FingerprintDb, VersionRecord, FeatureEntry, FeatureRequirement, PolicyDefaults, Implementation — fully serde round-trippable JSON schema - src/lib.rs: crate root with mod db - src/main.rs: CLI scaffold with scrape/classify/validate subcommands (stubs) - tests/db_roundtrip.rs: 23 integration tests covering all types, insert/lookup/iteration, JSON round-trips, edge cases Pre-existing gossip_archiver test failures are not introduced by this branch.

…cords Source LND feature sets from feature/default_sets.go across git tags: v0.15.5-beta, v0.16.4-beta, v0.17.5-beta, v0.18.4-beta Source policy defaults from chainreg/chainregistry.go: cltv_expiry_delta=80, fee_base_msat=1000, fee_rate=1ppm, htlc_min=1000msat Key per-version deltas: v0.16: identical to v0.15 (no feature set change) v0.17: adds SimpleTaprootChannelsOptionalStaging (bit 181) v0.18: TLVOnion Required(8) promoted from Optional(9); adds RouteBlindingOptional(25), SimpleTaprootChannelsOptionalFinal(81), SimpleTaprootOverlayChansOptional(2025); drops ScriptEnforcedLeaseOptional(2023) from defaults New files: impl_fingerprint/src/scraper/mod.rs - build_db() entry point impl_fingerprint/src/scraper/lnd.rs - 4 LND version records + 14 unit tests impl_fingerprint/src/scraper/cln.rs - stub (empty) impl_fingerprint/src/scraper/ldk.rs - stub (empty) impl_fingerprint/src/scraper/eclair.rs - stub (empty) impl_fingerprint/tests/scraper_lnd.rs - 16 integration tests Updated: impl_fingerprint/src/lib.rs - expose pub mod scraper impl_fingerprint/src/main.rs - wire Scrape command to build_db() All 53 tests pass (14 lnd unit + 23 db_roundtrip + 16 scraper_lnd), 0 warnings.

…Channel + load_nodes/load_channels) - impl_fingerprint/src/input.rs: four serde types mirroring gossip_analyze dump format - InputNode { pubkey: String, info: InputNodeAnn } - InputNodeAnn { last_update_timestamp, alias, addresses: Vec<String>, node_features: String (BE hex) } - InputChannel { node_one/two: String, capacity/scid: Option<u64>, one_to_two/two_to_one: Option<InputDirectionPolicy> } - InputDirectionPolicy { htlc_min/max_msat, fees_base/proportional, cltv_expiry_delta, last_update_timestamp } - load_nodes(path) / load_channels(path) → anyhow::Result<Vec<_>> - Pubkeys and addresses are plain String (no serde_with dep needed) - node_features is BE hex — matches scraper::lnd::bits_to_hex encoding exactly - impl_fingerprint/src/lib.rs: pub mod input added - impl_fingerprint/tests/input_loading.rs: 8 integration tests - load_{nodes,channels}_from_empty_array - {node,channel}_round_trips_through_file - node_features_hex_preserved_round_trip (long LND-style hex string) - channel_null_direction_round_trips (all Option fields None) - load_{nodes,channels}_from_gossip_analyze_json (inline realistic JSON) - impl_fingerprint/Cargo.toml: tempfile = "3" added to dev-dependencies All 61 tests pass (14 unit + 23 db_roundtrip + 8 input_loading + 16 scraper_lnd), 0 warnings.

…de / classify_all) impl_fingerprint/src/classifier.rs — new module: - Confidence enum: Unknown / Low / Medium / High - Classification struct: pubkey, implementation, version_min/max, confidence, layer - Layer 1 (exact hex): compare node_features hex against node_feature_hex per record; single unique match → High; multiple same-impl matches → High with version range - Layer 2 (heuristic bits): decode BE hex, test each FeatureRequirement rule (Mandatory=even bit, Optional=odd bit, Set=either, Not* variants); single-impl match → High, multi-impl → Medium with most-matched impl - Layer 3 (policy scoring): score node's channel CLTV/fee defaults against each impl's PolicyDefaults; best score → Medium (single winner) or Low - feature_name_to_bit: kebab-case name table matching scraper::lnd conventions - classify_node(node, db, channels) + classify_all(nodes, db, channels) - classify_all filters channels per-node before passing to classify_node impl_fingerprint/src/lib.rs: pub mod classifier added impl_fingerprint/src/main.rs: Commands::Classify now fully implemented — loads DB + nodes + channels, calls classify_all, writes JSON output impl_fingerprint/src/scraper/lnd.rs: remove out-of-hex bits from heuristic lists - bit 2023 (script-enforced-lease) removed from features_v015 list: the 253-byte vector is excluded from node_feature_hex so requiring it in the heuristic would always block matching from the stored hex - bit 2025 (taproot-overlay-chans) removed from features_v018 for same reason impl_fingerprint/tests/classifier.rs — 11 integration tests: layer1: v0.17 exact hex, v0.18 exact hex, v0.15/v0.16 shared hex same-impl range layer2: v0.18 heuristic match after hex cleared (exercises bit-level logic) layer3: LND policy match as node_one and node_two; empty features → layer 3 edge: empty node → Unknown; empty DB → Unknown; classify_all order/filtering; Classification JSON roundtrip All 72 tests pass (14 unit + 11 classifier + 23 db_roundtrip + 8 input_loading + 16 scraper_lnd), 0 warnings.

…t + ValidationReport) impl_fingerprint/src/validate.rs: - TrainingSet: BTreeMap<pubkey, Implementation>, from_json/load constructors - ImplStats: true_positive/false_positive/false_negative/true_negative, precision() and recall() helpers returning Option<f64> - ConfusionMatrix: BTreeMap<predicted_str, BTreeMap<actual_str, count>> - ValidationReport: training_set_size, missing_nodes, evaluated, correct, accuracy: Option<f64>, per_impl, confusion, correct_by_confidence, correct_by_layer; summary() pretty-printer - run_validation(training, nodes, channels, db) -> ValidationReport: indexes nodes by pubkey, counts missing, runs classify_all on eval subset, scores TP/FP/FN/TN per implementation, populates confusion matrix and correct_by_confidence/layer breakdowns impl_fingerprint/src/lib.rs: pub mod validate added impl_fingerprint/src/main.rs: Commands::Validate wired — loads TrainingSet + DB + nodes + channels, calls run_validation, eprints summary impl_fingerprint/tests/validate.rs — 13 integration tests: TrainingSet: loads from JSON, empty object, invalid JSON, invalid impl name, file run_validation: all-correct exact-hex, all-unknown→zero accuracy, missing nodes, empty training set, confusion matrix shape, layer-3 policy path ValidationReport: summary smoke test, JSON roundtrip All 85 tests pass (14 unit + 11 classifier + 23 db + 8 input + 16 scraper + 13 validate), 0 warnings.

justfile — 6 new fingerprint targets: fingerprint-scrape → impl_fingerprint scrape (no network) fingerprint-dump → gossip_analyze dump (mainnet, 192.168.0.189:8332) fingerprint-dump-signet → gossip_analyze dump (signet, 192.168.0.189:38332) fingerprint-classify → impl_fingerprint classify gossip_dump/ fingerprint-validate ts=… → impl_fingerprint validate with training-set JSON fingerprint → scrape + classify (dump assumed done) mainnet_rpc / signet_rpc variables baked in from ~/.bitcoin/{bitcoin,bitcoin-signet}.conf impl_fingerprint/README.md — full workflow documentation: - Quick start (3-step scrape → dump → classify) - Subcommand reference with example JSON output for classify - Training-set format and validate output example - justfile target table - gossip_analyze dump timing explanation (convergence + UTXO wait) - Crate structure tree - Feature bit encoding note (BE hex, LE flags reversed)

Implement scraper/cln.rs with hardcoded records for CLN v23.11.2, v24.02.2, v24.08.2, and v24.11.1, sourced from common/features.c (feature_styles[] with NODE_ANNOUNCE_FEATURE = FEATURE_REPRESENT) and lightningd/options.c (mainnet_config policy defaults). Key differences from LND: - All node-announcement features set at even (mandatory) bit - CLN-unique features: dual-fund (28), onion-messages (38), quiesce (34), splice (62), peer-backup-storage (40/42) - Policy defaults: cltv_delta=34, fee_ppm=10, htlc_min=0 - v24.08+ drops anchor_outputs (bit 20, deprecated) Update classifier feature_name_to_bit() map with 14 new entries for CLN-specific feature names. Add tests/scraper_cln.rs with 22 integration tests covering: - Feature presence/absence per version - Policy defaults verification - Hex round-trips and cross-implementation uniqueness - Classifier integration (exact hex + policy scoring) All 121 tests pass (up from 85), no new clippy warnings.

- Test count: 85 → 121 - Scrape versions table: CLN no longer a stub - Crate structure: cln.rs description + scraper_cln.rs test file

Add hardcoded LDK version records for v0.0.118, v0.0.125, v0.1.6, and v0.2.2 sourced from provided_init_features()/provided_node_features() in lightning/src/ln/channelmanager.rs and policy defaults from lightning/src/util/config.rs. Key version evolution: - v0.0.118: base feature set (no route_blinding) - v0.0.125: adds route_blinding_optional - v0.1.6: same as v0.0.125 (dual_fund is cfg-gated) - v0.2.2: channel_type promoted to required; adds quiesce, splice, provide_storage LDK-distinctive signals: - fee_proportional_millionths=0 (vs LND=1, CLN=10) - cltv_expiry_delta=72 (vs LND=80, CLN=34) - htlc_minimum_msat=1 (vs LND=1000, CLN=0) - NotSet constraints for gossip-queries, gossip-queries-ex, amp prevent false heuristic matches against LND/CLN nodes 14 unit tests + 27 integration tests, all passing. Total test count: 162 (up from 121).

Add hardcoded Eclair version records for v0.9.0 and v0.10.0 sourced from reference.conf (features block) cross-referenced with Features.scala (NodeFeature trait) for node announcement inclusion. Key version evolution: - v0.9.0: data_loss_protect/gossip_queries/static_remote_key all optional - v0.10.0: all three promoted to mandatory; adds dual_fund optional Eclair-distinctive signals: - fee_proportional_millionths=200 (vs LND=1, CLN=10, LDK=0) - cltv_expiry_delta=144 (vs LND=80, CLN=34, LDK=72) - NotSet constraints for amp, keysend - gossip-queries present (unlike LDK) - onion-messages present (unlike LND) 13 unit tests + 22 integration tests, all passing. Total test count: 197 (up from 162).

- Test count: 121 → 197 (8 integration files + 55 unit tests) - Scrape table: LDK/Eclair no longer stubs, show actual versions - Crate structure: LDK/Eclair descriptions + test files listed

…licate bits_to_hex - Fix 6 clippy warnings: doc overindent in lib.rs, is_multiple_of() in classifier.rs, collapsible if in classifier.rs Layer 2, doc lazy continuation in ldk.rs - Remove unused dependencies: observer_common, lightning-types - Deduplicate bits_to_hex() from 4 scraper files into scraper/mod.rs - Update lib.rs module doc to reflect current state (was stale Phase 2/3/4/5) Net -67 lines. All 197 tests pass, zero clippy warnings.

… fixtures - input.rs: #[serde(default)] on InputNodeAnn::node_features so older gossip dumps that lack the field deserialize gracefully (classifier falls through to Layer 3 policy scoring). - tests/fixtures/synthetic_nodes.json: 18 hand-crafted nodes exercising all three classifier layers plus unknown: 8 Layer-1 (exact hex from DB for LND/CLN/LDK/Eclair) 4 Layer-2 (bit-2 flipped to break exact match, still passes heuristic for exactly one implementation) 4 Layer-3 (no features, channels with implementation-specific policy defaults) 2 Unknown (no data / unrecognized features) - tests/fixtures/synthetic_channels.json: channels with distinctive policy defaults (cltv/fee_ppm) for Layer-2 and Layer-3 nodes.

demo.sh walks through 6 phases with color-coded terminal output: 1. Source research — lists the 4 GitHub repos and 14 version tags 2. Build — compiles the binary in release mode 3. Scrape — builds fingerprint_db.json (14 records, 4 implementations) 4. Classify — runs 18 synthetic nodes through all 3 classifier layers 5. Validate — checks 100% accuracy against a derived training set 6. Test suite — runs all 197 unit/integration tests Usage: ./impl_fingerprint/demo.sh

The 55 unit tests inside src/scraper/{lnd,cln,ldk,eclair}.rs duplicated coverage already provided by the 142 integration tests in tests/scraper_*.rs. Remove the inline #[cfg(test)] modules to cut ~560 lines of redundancy. All 142 integration tests continue to pass with no coverage loss.

0xlaga added 21 commits March 26, 2026 15:54

chore: ignore local PLAN.md

ec38e41

docs(impl_fingerprint): minor README updates

080c1a7

chore(justfile): minor updates

12a465e

docs(impl_fingerprint): add overview table to README

9d92659

impl_fingerprint: update README for CLN scraper

6439eb6

- Test count: 85 → 121 - Scrape versions table: CLN no longer a stub - Crate structure: cln.rs description + scraper_cln.rs test file

docs(fingerprint): update README for completed LDK + Eclair scrapers

78bdc78

- Test count: 121 → 197 (8 integration files + 55 unit tests) - Scrape table: LDK/Eclair no longer stubs, show actual versions - Crate structure: LDK/Eclair descriptions + test files listed

docs(fingerprint): note bits_to_hex in mod.rs description

8235a1d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/implementation fingerprinting#12

Feat/implementation fingerprinting#12
0xlaga wants to merge 21 commits into
jharveyb:mainfrom
0xlaga:feat/impl-fingerprinting

0xlaga commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xlaga commented Apr 1, 2026

impl_fingerprint — Lightning node implementation fingerprinting

What's in this PR

Implementation coverage

Classifier layers

Crate structure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`impl_fingerprint` — Lightning node implementation fingerprinting