Feat/implementation fingerprinting#12
Open
0xlaga wants to merge 21 commits into
Open
Conversation
…ests - Add node_features (Vec<u8> LE bytes) to observer_common::types::NodeAnnouncementInfo - Add node_features (bytes) field to common.proto NodeAnnouncementInfo (field 4) - Regenerate common.rs from updated proto - Update all From impls in types.rs to populate node_features via le_flags() - Add node_features (hex String, BE) to gossip_analyze::NodeAnnInfo - Add observer_common/tests/proto_roundtrip.rs (8 integration tests) - Add gossip_analyze/tests/serde_roundtrip.rs (7 integration tests)
New crate impl_fingerprint with: - src/db.rs: FingerprintDb, VersionRecord, FeatureEntry, FeatureRequirement, PolicyDefaults, Implementation — fully serde round-trippable JSON schema - src/lib.rs: crate root with mod db - src/main.rs: CLI scaffold with scrape/classify/validate subcommands (stubs) - tests/db_roundtrip.rs: 23 integration tests covering all types, insert/lookup/iteration, JSON round-trips, edge cases Pre-existing gossip_archiver test failures are not introduced by this branch.
…cords
Source LND feature sets from feature/default_sets.go across git tags:
v0.15.5-beta, v0.16.4-beta, v0.17.5-beta, v0.18.4-beta
Source policy defaults from chainreg/chainregistry.go:
cltv_expiry_delta=80, fee_base_msat=1000, fee_rate=1ppm, htlc_min=1000msat
Key per-version deltas:
v0.16: identical to v0.15 (no feature set change)
v0.17: adds SimpleTaprootChannelsOptionalStaging (bit 181)
v0.18: TLVOnion Required(8) promoted from Optional(9);
adds RouteBlindingOptional(25), SimpleTaprootChannelsOptionalFinal(81),
SimpleTaprootOverlayChansOptional(2025);
drops ScriptEnforcedLeaseOptional(2023) from defaults
New files:
impl_fingerprint/src/scraper/mod.rs - build_db() entry point
impl_fingerprint/src/scraper/lnd.rs - 4 LND version records + 14 unit tests
impl_fingerprint/src/scraper/cln.rs - stub (empty)
impl_fingerprint/src/scraper/ldk.rs - stub (empty)
impl_fingerprint/src/scraper/eclair.rs - stub (empty)
impl_fingerprint/tests/scraper_lnd.rs - 16 integration tests
Updated:
impl_fingerprint/src/lib.rs - expose pub mod scraper
impl_fingerprint/src/main.rs - wire Scrape command to build_db()
All 53 tests pass (14 lnd unit + 23 db_roundtrip + 16 scraper_lnd), 0 warnings.
…Channel + load_nodes/load_channels)
- impl_fingerprint/src/input.rs: four serde types mirroring gossip_analyze dump format
- InputNode { pubkey: String, info: InputNodeAnn }
- InputNodeAnn { last_update_timestamp, alias, addresses: Vec<String>, node_features: String (BE hex) }
- InputChannel { node_one/two: String, capacity/scid: Option<u64>, one_to_two/two_to_one: Option<InputDirectionPolicy> }
- InputDirectionPolicy { htlc_min/max_msat, fees_base/proportional, cltv_expiry_delta, last_update_timestamp }
- load_nodes(path) / load_channels(path) → anyhow::Result<Vec<_>>
- Pubkeys and addresses are plain String (no serde_with dep needed)
- node_features is BE hex — matches scraper::lnd::bits_to_hex encoding exactly
- impl_fingerprint/src/lib.rs: pub mod input added
- impl_fingerprint/tests/input_loading.rs: 8 integration tests
- load_{nodes,channels}_from_empty_array
- {node,channel}_round_trips_through_file
- node_features_hex_preserved_round_trip (long LND-style hex string)
- channel_null_direction_round_trips (all Option fields None)
- load_{nodes,channels}_from_gossip_analyze_json (inline realistic JSON)
- impl_fingerprint/Cargo.toml: tempfile = "3" added to dev-dependencies
All 61 tests pass (14 unit + 23 db_roundtrip + 8 input_loading + 16 scraper_lnd), 0 warnings.
…de / classify_all)
impl_fingerprint/src/classifier.rs — new module:
- Confidence enum: Unknown / Low / Medium / High
- Classification struct: pubkey, implementation, version_min/max, confidence, layer
- Layer 1 (exact hex): compare node_features hex against node_feature_hex per record;
single unique match → High; multiple same-impl matches → High with version range
- Layer 2 (heuristic bits): decode BE hex, test each FeatureRequirement rule
(Mandatory=even bit, Optional=odd bit, Set=either, Not* variants);
single-impl match → High, multi-impl → Medium with most-matched impl
- Layer 3 (policy scoring): score node's channel CLTV/fee defaults against
each impl's PolicyDefaults; best score → Medium (single winner) or Low
- feature_name_to_bit: kebab-case name table matching scraper::lnd conventions
- classify_node(node, db, channels) + classify_all(nodes, db, channels)
- classify_all filters channels per-node before passing to classify_node
impl_fingerprint/src/lib.rs: pub mod classifier added
impl_fingerprint/src/main.rs: Commands::Classify now fully implemented —
loads DB + nodes + channels, calls classify_all, writes JSON output
impl_fingerprint/src/scraper/lnd.rs: remove out-of-hex bits from heuristic lists
- bit 2023 (script-enforced-lease) removed from features_v015 list: the
253-byte vector is excluded from node_feature_hex so requiring it in the
heuristic would always block matching from the stored hex
- bit 2025 (taproot-overlay-chans) removed from features_v018 for same reason
impl_fingerprint/tests/classifier.rs — 11 integration tests:
layer1: v0.17 exact hex, v0.18 exact hex, v0.15/v0.16 shared hex same-impl range
layer2: v0.18 heuristic match after hex cleared (exercises bit-level logic)
layer3: LND policy match as node_one and node_two; empty features → layer 3
edge: empty node → Unknown; empty DB → Unknown; classify_all order/filtering;
Classification JSON roundtrip
All 72 tests pass (14 unit + 11 classifier + 23 db_roundtrip + 8 input_loading
+ 16 scraper_lnd), 0 warnings.
…t + ValidationReport)
impl_fingerprint/src/validate.rs:
- TrainingSet: BTreeMap<pubkey, Implementation>, from_json/load constructors
- ImplStats: true_positive/false_positive/false_negative/true_negative,
precision() and recall() helpers returning Option<f64>
- ConfusionMatrix: BTreeMap<predicted_str, BTreeMap<actual_str, count>>
- ValidationReport: training_set_size, missing_nodes, evaluated, correct,
accuracy: Option<f64>, per_impl, confusion, correct_by_confidence,
correct_by_layer; summary() pretty-printer
- run_validation(training, nodes, channels, db) -> ValidationReport:
indexes nodes by pubkey, counts missing, runs classify_all on eval subset,
scores TP/FP/FN/TN per implementation, populates confusion matrix and
correct_by_confidence/layer breakdowns
impl_fingerprint/src/lib.rs: pub mod validate added
impl_fingerprint/src/main.rs: Commands::Validate wired — loads TrainingSet +
DB + nodes + channels, calls run_validation, eprints summary
impl_fingerprint/tests/validate.rs — 13 integration tests:
TrainingSet: loads from JSON, empty object, invalid JSON, invalid impl name, file
run_validation: all-correct exact-hex, all-unknown→zero accuracy, missing nodes,
empty training set, confusion matrix shape, layer-3 policy path
ValidationReport: summary smoke test, JSON roundtrip
All 85 tests pass (14 unit + 11 classifier + 23 db + 8 input + 16 scraper + 13 validate),
0 warnings.
justfile — 6 new fingerprint targets:
fingerprint-scrape → impl_fingerprint scrape (no network)
fingerprint-dump → gossip_analyze dump (mainnet, 192.168.0.189:8332)
fingerprint-dump-signet → gossip_analyze dump (signet, 192.168.0.189:38332)
fingerprint-classify → impl_fingerprint classify gossip_dump/
fingerprint-validate ts=… → impl_fingerprint validate with training-set JSON
fingerprint → scrape + classify (dump assumed done)
mainnet_rpc / signet_rpc variables baked in from ~/.bitcoin/{bitcoin,bitcoin-signet}.conf
impl_fingerprint/README.md — full workflow documentation:
- Quick start (3-step scrape → dump → classify)
- Subcommand reference with example JSON output for classify
- Training-set format and validate output example
- justfile target table
- gossip_analyze dump timing explanation (convergence + UTXO wait)
- Crate structure tree
- Feature bit encoding note (BE hex, LE flags reversed)
Implement scraper/cln.rs with hardcoded records for CLN v23.11.2, v24.02.2, v24.08.2, and v24.11.1, sourced from common/features.c (feature_styles[] with NODE_ANNOUNCE_FEATURE = FEATURE_REPRESENT) and lightningd/options.c (mainnet_config policy defaults). Key differences from LND: - All node-announcement features set at even (mandatory) bit - CLN-unique features: dual-fund (28), onion-messages (38), quiesce (34), splice (62), peer-backup-storage (40/42) - Policy defaults: cltv_delta=34, fee_ppm=10, htlc_min=0 - v24.08+ drops anchor_outputs (bit 20, deprecated) Update classifier feature_name_to_bit() map with 14 new entries for CLN-specific feature names. Add tests/scraper_cln.rs with 22 integration tests covering: - Feature presence/absence per version - Policy defaults verification - Hex round-trips and cross-implementation uniqueness - Classifier integration (exact hex + policy scoring) All 121 tests pass (up from 85), no new clippy warnings.
- Test count: 85 → 121 - Scrape versions table: CLN no longer a stub - Crate structure: cln.rs description + scraper_cln.rs test file
Add hardcoded LDK version records for v0.0.118, v0.0.125, v0.1.6, and v0.2.2 sourced from provided_init_features()/provided_node_features() in lightning/src/ln/channelmanager.rs and policy defaults from lightning/src/util/config.rs. Key version evolution: - v0.0.118: base feature set (no route_blinding) - v0.0.125: adds route_blinding_optional - v0.1.6: same as v0.0.125 (dual_fund is cfg-gated) - v0.2.2: channel_type promoted to required; adds quiesce, splice, provide_storage LDK-distinctive signals: - fee_proportional_millionths=0 (vs LND=1, CLN=10) - cltv_expiry_delta=72 (vs LND=80, CLN=34) - htlc_minimum_msat=1 (vs LND=1000, CLN=0) - NotSet constraints for gossip-queries, gossip-queries-ex, amp prevent false heuristic matches against LND/CLN nodes 14 unit tests + 27 integration tests, all passing. Total test count: 162 (up from 121).
Add hardcoded Eclair version records for v0.9.0 and v0.10.0 sourced from reference.conf (features block) cross-referenced with Features.scala (NodeFeature trait) for node announcement inclusion. Key version evolution: - v0.9.0: data_loss_protect/gossip_queries/static_remote_key all optional - v0.10.0: all three promoted to mandatory; adds dual_fund optional Eclair-distinctive signals: - fee_proportional_millionths=200 (vs LND=1, CLN=10, LDK=0) - cltv_expiry_delta=144 (vs LND=80, CLN=34, LDK=72) - NotSet constraints for amp, keysend - gossip-queries present (unlike LDK) - onion-messages present (unlike LND) 13 unit tests + 22 integration tests, all passing. Total test count: 197 (up from 162).
- Test count: 121 → 197 (8 integration files + 55 unit tests) - Scrape table: LDK/Eclair no longer stubs, show actual versions - Crate structure: LDK/Eclair descriptions + test files listed
…licate bits_to_hex - Fix 6 clippy warnings: doc overindent in lib.rs, is_multiple_of() in classifier.rs, collapsible if in classifier.rs Layer 2, doc lazy continuation in ldk.rs - Remove unused dependencies: observer_common, lightning-types - Deduplicate bits_to_hex() from 4 scraper files into scraper/mod.rs - Update lib.rs module doc to reflect current state (was stale Phase 2/3/4/5) Net -67 lines. All 197 tests pass, zero clippy warnings.
… fixtures
- input.rs: #[serde(default)] on InputNodeAnn::node_features so older
gossip dumps that lack the field deserialize gracefully (classifier
falls through to Layer 3 policy scoring).
- tests/fixtures/synthetic_nodes.json: 18 hand-crafted nodes exercising
all three classifier layers plus unknown:
8 Layer-1 (exact hex from DB for LND/CLN/LDK/Eclair)
4 Layer-2 (bit-2 flipped to break exact match, still passes
heuristic for exactly one implementation)
4 Layer-3 (no features, channels with implementation-specific
policy defaults)
2 Unknown (no data / unrecognized features)
- tests/fixtures/synthetic_channels.json: channels with distinctive
policy defaults (cltv/fee_ppm) for Layer-2 and Layer-3 nodes.
demo.sh walks through 6 phases with color-coded terminal output: 1. Source research — lists the 4 GitHub repos and 14 version tags 2. Build — compiles the binary in release mode 3. Scrape — builds fingerprint_db.json (14 records, 4 implementations) 4. Classify — runs 18 synthetic nodes through all 3 classifier layers 5. Validate — checks 100% accuracy against a derived training set 6. Test suite — runs all 197 unit/integration tests Usage: ./impl_fingerprint/demo.sh
The 55 unit tests inside src/scraper/{lnd,cln,ldk,eclair}.rs duplicated
coverage already provided by the 142 integration tests in tests/scraper_*.rs.
Remove the inline #[cfg(test)] modules to cut ~560 lines of redundancy.
All 142 integration tests continue to pass with no coverage loss.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
impl_fingerprint— Lightning node implementation fingerprintingCloses #10.
What's in this PR
impl_fingerprint/— standalone binary withscrape,classify, andvalidatesubcommandsobserver_common(proto + types) andgossip_analyze(dump plumbing) to exposenode_featureshexsrc/scraper/from upstream source code — no network access at runtimedemo.shexercises the full pipeline end-to-end with color-coded terminal outputImplementation coverage
feature/default_sets.go,chainreg/chainregistry.gocommon/features.c,lightningd/options.clightning/src/ln/channelmanager.rs,lightning/src/util/config.rsreference.conf,Features.scalaClassifier layers
node_featureshex matches a DB record exactly → high confidence, specific version range.NotSetconstraints are the key differentiators (e.g. LDK never sets gossip-queries/amp, Eclair never sets amp/keysend).Crate structure