Skip to content

Feat/implementation fingerprinting#12

Open
0xlaga wants to merge 21 commits into
jharveyb:mainfrom
0xlaga:feat/impl-fingerprinting
Open

Feat/implementation fingerprinting#12
0xlaga wants to merge 21 commits into
jharveyb:mainfrom
0xlaga:feat/impl-fingerprinting

Conversation

@0xlaga

@0xlaga 0xlaga commented Apr 1, 2026

Copy link
Copy Markdown

impl_fingerprint — Lightning node implementation fingerprinting

Closes #10.

What's in this PR

Area Description
New crate impl_fingerprint/ — standalone binary with scrape, classify, and validate subcommands
Upstream changes Minimal additions to observer_common (proto + types) and gossip_analyze (dump plumbing) to expose node_features hex
Scrapers Version records for 4 implementations (14 versions), hardcoded in src/scraper/ from upstream source code — no network access at runtime
Classifier Three-layer cascade that short-circuits on first confident match
Validation Training-set scorer with per-implementation precision/recall and confusion matrix
Test suite 142 integration tests — all passing, zero clippy warnings
Demo script demo.sh exercises the full pipeline end-to-end with color-coded terminal output

Implementation coverage

Implementation Versions Source files studied
LND v0.15.5-beta → v0.18.4-beta feature/default_sets.go, chainreg/chainregistry.go
CLN v23.11.2 → v24.11.1 common/features.c, lightningd/options.c
LDK v0.0.118 → v0.2.2 lightning/src/ln/channelmanager.rs, lightning/src/util/config.rs
Eclair v0.9.0 → v0.10.0 reference.conf, Features.scala

Classifier layers

  1. Exact hex — node's node_features hex matches a DB record exactly → high confidence, specific version range.
  2. Heuristic bits — individual feature bits tested against each version's mandatory/optional/not-set requirements. NotSet constraints are the key differentiators (e.g. LDK never sets gossip-queries/amp, Eclair never sets amp/keysend).
  3. Policy scoring — channel policy defaults scored against each implementation's known defaults (LND: cltv=80/fee_ppm=1, CLN: cltv=34/fee_ppm=10, LDK: cltv=72/fee_ppm=0, Eclair: cltv=144/fee_ppm=200).

Crate structure

impl_fingerprint/
├── src/
│   ├── main.rs          # CLI (clap)
│   ├── lib.rs           # public API re-exports
│   ├── scraper/         # {lnd,cln,ldk,eclair}.rs — hardcoded version records
│   ├── db.rs            # fingerprint database (JSON serde)
│   ├── input.rs         # load nodes/channels from gossip_analyze dumps
│   ├── classifier.rs    # 3-layer cascade
│   └── validate.rs      # accuracy scoring + confusion matrix
├── tests/               # 142 integration tests + synthetic fixtures
└── demo.sh              # end-to-end pipeline demo

0xlaga added 21 commits March 26, 2026 15:54
…ests

- Add node_features (Vec<u8> LE bytes) to observer_common::types::NodeAnnouncementInfo
- Add node_features (bytes) field to common.proto NodeAnnouncementInfo (field 4)
- Regenerate common.rs from updated proto
- Update all From impls in types.rs to populate node_features via le_flags()
- Add node_features (hex String, BE) to gossip_analyze::NodeAnnInfo
- Add observer_common/tests/proto_roundtrip.rs (8 integration tests)
- Add gossip_analyze/tests/serde_roundtrip.rs (7 integration tests)
New crate impl_fingerprint with:
- src/db.rs: FingerprintDb, VersionRecord, FeatureEntry, FeatureRequirement,
  PolicyDefaults, Implementation — fully serde round-trippable JSON schema
- src/lib.rs: crate root with mod db
- src/main.rs: CLI scaffold with scrape/classify/validate subcommands (stubs)
- tests/db_roundtrip.rs: 23 integration tests covering all types,
  insert/lookup/iteration, JSON round-trips, edge cases

Pre-existing gossip_archiver test failures are not introduced by this branch.
…cords

Source LND feature sets from feature/default_sets.go across git tags:
  v0.15.5-beta, v0.16.4-beta, v0.17.5-beta, v0.18.4-beta

Source policy defaults from chainreg/chainregistry.go:
  cltv_expiry_delta=80, fee_base_msat=1000, fee_rate=1ppm, htlc_min=1000msat

Key per-version deltas:
  v0.16: identical to v0.15 (no feature set change)
  v0.17: adds SimpleTaprootChannelsOptionalStaging (bit 181)
  v0.18: TLVOnion Required(8) promoted from Optional(9);
         adds RouteBlindingOptional(25), SimpleTaprootChannelsOptionalFinal(81),
         SimpleTaprootOverlayChansOptional(2025);
         drops ScriptEnforcedLeaseOptional(2023) from defaults

New files:
  impl_fingerprint/src/scraper/mod.rs   - build_db() entry point
  impl_fingerprint/src/scraper/lnd.rs   - 4 LND version records + 14 unit tests
  impl_fingerprint/src/scraper/cln.rs   - stub (empty)
  impl_fingerprint/src/scraper/ldk.rs   - stub (empty)
  impl_fingerprint/src/scraper/eclair.rs - stub (empty)
  impl_fingerprint/tests/scraper_lnd.rs - 16 integration tests

Updated:
  impl_fingerprint/src/lib.rs  - expose pub mod scraper
  impl_fingerprint/src/main.rs - wire Scrape command to build_db()

All 53 tests pass (14 lnd unit + 23 db_roundtrip + 16 scraper_lnd), 0 warnings.
…Channel + load_nodes/load_channels)

- impl_fingerprint/src/input.rs: four serde types mirroring gossip_analyze dump format
    - InputNode { pubkey: String, info: InputNodeAnn }
    - InputNodeAnn { last_update_timestamp, alias, addresses: Vec<String>, node_features: String (BE hex) }
    - InputChannel { node_one/two: String, capacity/scid: Option<u64>, one_to_two/two_to_one: Option<InputDirectionPolicy> }
    - InputDirectionPolicy { htlc_min/max_msat, fees_base/proportional, cltv_expiry_delta, last_update_timestamp }
    - load_nodes(path) / load_channels(path) → anyhow::Result<Vec<_>>
    - Pubkeys and addresses are plain String (no serde_with dep needed)
    - node_features is BE hex — matches scraper::lnd::bits_to_hex encoding exactly
- impl_fingerprint/src/lib.rs: pub mod input added
- impl_fingerprint/tests/input_loading.rs: 8 integration tests
    - load_{nodes,channels}_from_empty_array
    - {node,channel}_round_trips_through_file
    - node_features_hex_preserved_round_trip (long LND-style hex string)
    - channel_null_direction_round_trips (all Option fields None)
    - load_{nodes,channels}_from_gossip_analyze_json (inline realistic JSON)
- impl_fingerprint/Cargo.toml: tempfile = "3" added to dev-dependencies

All 61 tests pass (14 unit + 23 db_roundtrip + 8 input_loading + 16 scraper_lnd), 0 warnings.
…de / classify_all)

impl_fingerprint/src/classifier.rs — new module:
  - Confidence enum: Unknown / Low / Medium / High
  - Classification struct: pubkey, implementation, version_min/max, confidence, layer
  - Layer 1 (exact hex): compare node_features hex against node_feature_hex per record;
    single unique match → High; multiple same-impl matches → High with version range
  - Layer 2 (heuristic bits): decode BE hex, test each FeatureRequirement rule
    (Mandatory=even bit, Optional=odd bit, Set=either, Not* variants);
    single-impl match → High, multi-impl → Medium with most-matched impl
  - Layer 3 (policy scoring): score node's channel CLTV/fee defaults against
    each impl's PolicyDefaults; best score → Medium (single winner) or Low
  - feature_name_to_bit: kebab-case name table matching scraper::lnd conventions
  - classify_node(node, db, channels) + classify_all(nodes, db, channels)
  - classify_all filters channels per-node before passing to classify_node

impl_fingerprint/src/lib.rs: pub mod classifier added

impl_fingerprint/src/main.rs: Commands::Classify now fully implemented —
  loads DB + nodes + channels, calls classify_all, writes JSON output

impl_fingerprint/src/scraper/lnd.rs: remove out-of-hex bits from heuristic lists
  - bit 2023 (script-enforced-lease) removed from features_v015 list: the
    253-byte vector is excluded from node_feature_hex so requiring it in the
    heuristic would always block matching from the stored hex
  - bit 2025 (taproot-overlay-chans) removed from features_v018 for same reason

impl_fingerprint/tests/classifier.rs — 11 integration tests:
  layer1: v0.17 exact hex, v0.18 exact hex, v0.15/v0.16 shared hex same-impl range
  layer2: v0.18 heuristic match after hex cleared (exercises bit-level logic)
  layer3: LND policy match as node_one and node_two; empty features → layer 3
  edge:   empty node → Unknown; empty DB → Unknown; classify_all order/filtering;
          Classification JSON roundtrip

All 72 tests pass (14 unit + 11 classifier + 23 db_roundtrip + 8 input_loading
+ 16 scraper_lnd), 0 warnings.
…t + ValidationReport)

impl_fingerprint/src/validate.rs:
  - TrainingSet: BTreeMap<pubkey, Implementation>, from_json/load constructors
  - ImplStats: true_positive/false_positive/false_negative/true_negative,
    precision() and recall() helpers returning Option<f64>
  - ConfusionMatrix: BTreeMap<predicted_str, BTreeMap<actual_str, count>>
  - ValidationReport: training_set_size, missing_nodes, evaluated, correct,
    accuracy: Option<f64>, per_impl, confusion, correct_by_confidence,
    correct_by_layer; summary() pretty-printer
  - run_validation(training, nodes, channels, db) -> ValidationReport:
    indexes nodes by pubkey, counts missing, runs classify_all on eval subset,
    scores TP/FP/FN/TN per implementation, populates confusion matrix and
    correct_by_confidence/layer breakdowns

impl_fingerprint/src/lib.rs: pub mod validate added
impl_fingerprint/src/main.rs: Commands::Validate wired — loads TrainingSet +
  DB + nodes + channels, calls run_validation, eprints summary

impl_fingerprint/tests/validate.rs — 13 integration tests:
  TrainingSet: loads from JSON, empty object, invalid JSON, invalid impl name, file
  run_validation: all-correct exact-hex, all-unknown→zero accuracy, missing nodes,
    empty training set, confusion matrix shape, layer-3 policy path
  ValidationReport: summary smoke test, JSON roundtrip

All 85 tests pass (14 unit + 11 classifier + 23 db + 8 input + 16 scraper + 13 validate),
0 warnings.
justfile — 6 new fingerprint targets:
  fingerprint-scrape        → impl_fingerprint scrape (no network)
  fingerprint-dump          → gossip_analyze dump (mainnet, 192.168.0.189:8332)
  fingerprint-dump-signet   → gossip_analyze dump (signet, 192.168.0.189:38332)
  fingerprint-classify      → impl_fingerprint classify gossip_dump/
  fingerprint-validate ts=… → impl_fingerprint validate with training-set JSON
  fingerprint               → scrape + classify (dump assumed done)
  mainnet_rpc / signet_rpc variables baked in from ~/.bitcoin/{bitcoin,bitcoin-signet}.conf

impl_fingerprint/README.md — full workflow documentation:
  - Quick start (3-step scrape → dump → classify)
  - Subcommand reference with example JSON output for classify
  - Training-set format and validate output example
  - justfile target table
  - gossip_analyze dump timing explanation (convergence + UTXO wait)
  - Crate structure tree
  - Feature bit encoding note (BE hex, LE flags reversed)
Implement scraper/cln.rs with hardcoded records for CLN v23.11.2,
v24.02.2, v24.08.2, and v24.11.1, sourced from common/features.c
(feature_styles[] with NODE_ANNOUNCE_FEATURE = FEATURE_REPRESENT)
and lightningd/options.c (mainnet_config policy defaults).

Key differences from LND:
- All node-announcement features set at even (mandatory) bit
- CLN-unique features: dual-fund (28), onion-messages (38),
  quiesce (34), splice (62), peer-backup-storage (40/42)
- Policy defaults: cltv_delta=34, fee_ppm=10, htlc_min=0
- v24.08+ drops anchor_outputs (bit 20, deprecated)

Update classifier feature_name_to_bit() map with 14 new entries
for CLN-specific feature names.

Add tests/scraper_cln.rs with 22 integration tests covering:
- Feature presence/absence per version
- Policy defaults verification
- Hex round-trips and cross-implementation uniqueness
- Classifier integration (exact hex + policy scoring)

All 121 tests pass (up from 85), no new clippy warnings.
- Test count: 85 → 121
- Scrape versions table: CLN no longer a stub
- Crate structure: cln.rs description + scraper_cln.rs test file
Add hardcoded LDK version records for v0.0.118, v0.0.125, v0.1.6, and
v0.2.2 sourced from provided_init_features()/provided_node_features() in
lightning/src/ln/channelmanager.rs and policy defaults from
lightning/src/util/config.rs.

Key version evolution:
- v0.0.118: base feature set (no route_blinding)
- v0.0.125: adds route_blinding_optional
- v0.1.6: same as v0.0.125 (dual_fund is cfg-gated)
- v0.2.2: channel_type promoted to required; adds quiesce, splice,
  provide_storage

LDK-distinctive signals:
- fee_proportional_millionths=0 (vs LND=1, CLN=10)
- cltv_expiry_delta=72 (vs LND=80, CLN=34)
- htlc_minimum_msat=1 (vs LND=1000, CLN=0)
- NotSet constraints for gossip-queries, gossip-queries-ex, amp
  prevent false heuristic matches against LND/CLN nodes

14 unit tests + 27 integration tests, all passing.
Total test count: 162 (up from 121).
Add hardcoded Eclair version records for v0.9.0 and v0.10.0 sourced from
reference.conf (features block) cross-referenced with Features.scala
(NodeFeature trait) for node announcement inclusion.

Key version evolution:
- v0.9.0: data_loss_protect/gossip_queries/static_remote_key all optional
- v0.10.0: all three promoted to mandatory; adds dual_fund optional

Eclair-distinctive signals:
- fee_proportional_millionths=200 (vs LND=1, CLN=10, LDK=0)
- cltv_expiry_delta=144 (vs LND=80, CLN=34, LDK=72)
- NotSet constraints for amp, keysend
- gossip-queries present (unlike LDK)
- onion-messages present (unlike LND)

13 unit tests + 22 integration tests, all passing.
Total test count: 197 (up from 162).
- Test count: 121 → 197 (8 integration files + 55 unit tests)
- Scrape table: LDK/Eclair no longer stubs, show actual versions
- Crate structure: LDK/Eclair descriptions + test files listed
…licate bits_to_hex

- Fix 6 clippy warnings: doc overindent in lib.rs, is_multiple_of() in
  classifier.rs, collapsible if in classifier.rs Layer 2, doc lazy
  continuation in ldk.rs
- Remove unused dependencies: observer_common, lightning-types
- Deduplicate bits_to_hex() from 4 scraper files into scraper/mod.rs
- Update lib.rs module doc to reflect current state (was stale Phase 2/3/4/5)

Net -67 lines. All 197 tests pass, zero clippy warnings.
… fixtures

- input.rs: #[serde(default)] on InputNodeAnn::node_features so older
  gossip dumps that lack the field deserialize gracefully (classifier
  falls through to Layer 3 policy scoring).

- tests/fixtures/synthetic_nodes.json: 18 hand-crafted nodes exercising
  all three classifier layers plus unknown:
    8 Layer-1 (exact hex from DB for LND/CLN/LDK/Eclair)
    4 Layer-2 (bit-2 flipped to break exact match, still passes
              heuristic for exactly one implementation)
    4 Layer-3 (no features, channels with implementation-specific
              policy defaults)
    2 Unknown (no data / unrecognized features)

- tests/fixtures/synthetic_channels.json: channels with distinctive
  policy defaults (cltv/fee_ppm) for Layer-2 and Layer-3 nodes.
demo.sh walks through 6 phases with color-coded terminal output:
  1. Source research — lists the 4 GitHub repos and 14 version tags
  2. Build — compiles the binary in release mode
  3. Scrape — builds fingerprint_db.json (14 records, 4 implementations)
  4. Classify — runs 18 synthetic nodes through all 3 classifier layers
  5. Validate — checks 100% accuracy against a derived training set
  6. Test suite — runs all 197 unit/integration tests

Usage: ./impl_fingerprint/demo.sh
The 55 unit tests inside src/scraper/{lnd,cln,ldk,eclair}.rs duplicated
coverage already provided by the 142 integration tests in tests/scraper_*.rs.
Remove the inline #[cfg(test)] modules to cut ~560 lines of redundancy.

All 142 integration tests continue to pass with no coverage loss.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standalone implementation fingerprinting tool

1 participant