Skip to content

javierdejesusda/logflip-closed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logflip-closed

Reverse-replay NTFS $LogFile anti-forensics detection with cryptographically signed, never-false-confirm evidence.

Python Tests Types Lint Status

logflip reconstructs the timestamps a record carried before it was stomped by walking the NTFS $LogFile undo chain backward, then corroborates that reconstruction across three independent forensic channels before it will call anything tampering. Its defining property is restraint: the pipeline is engineered so that it can return clean, provisional, or anomaly, but it will only ever reach confirmed when a signed fingerprint database, a real engagement key, and every corroboration gate agree. It is built to be wrong on the side of silence, never on the side of a false accusation.

The command-line tool is logflip; the repository and Python package are logflip-closed.

Contents

Why it exists

Timestomping (forging a file's MACE timestamps to hide when it was really created or modified) is one of the oldest moves in the anti-forensics playbook, and one of the hardest to disprove after the fact. The $STANDARD_INFORMATION timestamps an investigator reads today are simply whatever the attacker last wrote. The original values are gone from the attribute itself.

They are not gone from the journal. NTFS records the before image of many metadata operations in $LogFile as undo data, precisely so the volume can roll back an interrupted transaction. logflip turns that recovery mechanism into a forensic one: it replays the undo chain in reverse to reconstruct the pre-tamper attribute state, then asks whether that reconstruction, the $STANDARD_INFORMATION-versus-$FILE_NAME relationship, and a signed tool-fingerprint database all point at the same conclusion.

No public tool performs this backward $LogFile replay to recover pre-tamper state and frame it for tool-family attribution. That reverse-replay-plus-attribution design is the novelty (see Prior art and novelty).

How it works

logflip treats a tampering verdict as something that must be earned by agreement across independent evidence sources, not asserted from any single signal.

NTFS image / volume
        |
        v
  $MFT + $LogFile + $UsnJrnl  parsers (USA fixup, multi-page RCRD reassembly)
        |
        v
  +-----------------------------+-----------------------------+
  | reverse-replay inversion    | SI-vs-FN delta screen       | fingerprint DB attribution |
  | (backward LSN walk to       | (necessary, not sufficient) | (signed, HMAC-verified)    |
  |  recover pre-tamper SI)     |                             |                            |
  +-----------------------------+-----------------------------+----------------------------+
        |
        v
  corroboration gates  ->  signed evidence leaf (HKDF + RFC 8785 + HMAC)  ->  HTML report

Three properties make the output trustworthy:

  • Incomplete-inversion guard. Circular-buffer rollover, cross-client log chains, sequence non-monotonicity, and out-of-bounds undo all resolve to INCONCLUSIVE rather than a guessed result. The tool refuses to reconstruct what it cannot reconstruct soundly.
  • Independent corroboration. A confirmed verdict requires agreement from distinct failure modes, so a single noisy channel can never carry a conviction on its own.
  • Never-false-confirm by construction. The shipped fingerprint families are metadata-only: capture analysis found no sound tool-specific $LogFile byte pattern, so every pattern_hex is empty. An empty-pattern matcher guard makes a spurious byte match structurally impossible. This is an honest encoding of "no validated signature exists yet," not a placeholder pending one.

Install

Requires Python 3.11 or newer.

python -m pip install -e ".[dev]"

This installs the runtime dependencies (pydantic, cryptography, pynacl, blake3, rfc8785) plus the development toolchain (pytest, mypy, ruff), and exposes the logflip command. Analyzing a live volume (--volume \\.\C:) requires administrator rights; analyzing a raw image file does not.

Quick start

Investigate a single record you already suspect:

python -m logflip detect --image disk.img --mft-record 5 --output leaf.json --report report.html

A stomped record exits 2 and prints verdict: provisional (or verdict: confirmed with a signed database and a real key); a clean record exits 0 and prints verdict: clean.

Let the tool find candidates for you when you do not know which record was touched:

python -m logflip scan --image disk.img --leaf-dir leaves/ --report-dir reports/
mft_record   verdict        tool_family
5            provisional    -
12           clean          -
31           skipped        DetectionError
candidates: 3  findings: 1  skipped: 1

scan exits 2 when at least one finding is returned. Leaf and report files are written only for non-clean records. Any signed leaf can later be re-verified offline with python -m logflip verify-leaf.

The verdict model

Verdict Meaning
clean The reverse replay found no byte disagreement on the target record.
provisional A disagreement and an SI-vs-FN delta were surfaced, but no tool family was confirmed. This is the ceiling with the stub database or a demo key.
confirmed Every corroboration gate passed against a signed database paired with a non-demo key. The stub database and the demo key can never produce this verdict, by design.
anomaly Single-source signal only (an SI-vs-FN delta with no $LogFile coverage). Surfaced by scan --include-mft-deltas as an investigative lead, never as proof.
Exit code Meaning
0 Clean: no tampering evidence (or, for scan, every candidate was clean).
2 Finding: at least one provisional or confirmed verdict. Anomalies alone never set this code.
1 Error: corrupt input, an I/O failure, or a fail-safe stop.

The never-false-confirm guarantee holds across hundreds of candidates: without a real signed --db and a real --key-file, confirmed is unreachable.

Command reference

Command Purpose
detect Run the full pipeline on one known $MFT record.
scan Enumerate candidate records and run the pipeline on each.
verify-leaf Re-verify a signed evidence leaf offline.
verify-db Verify the HMAC integrity of a signed fingerprint database.
build-db Build and sign a fingerprint database from a capture manifest.
keygen Generate a sealed 32-byte engagement key.
ingest-captures Turn operator capture bundles into a build-db manifest.

Every key is read from a raw 32-byte file via --key-file and must never be passed as a plain argument.

detect

python -m logflip detect (--image IMAGE | --volume VOLUME) --mft-record N
                         [--db DB_JSON] [--key-file KEY_FILE] [--variant VARIANT_JSON]
                         [--usnjrnl-record N] [--output LEAF_JSON] [--report REPORT_HTML]

--image and --volume are mutually exclusive, and exactly one is required.

Flag Description
--image IMAGE Path to a raw NTFS image file.
--volume VOLUME Live NTFS volume path (for example \\.\C:). Requires administrator rights.
--mft-record, -m Target $MFT record number to investigate (required).
--db DB_JSON Signed fingerprint database. When absent, the stub database is used and the verdict stays provisional at most.
--key-file, -k Raw 32-byte signing key file. When absent, a synthetic demo key is used (cannot produce confirmed).
--variant VARIANT_JSON Selects one variant of a multi-variant database (see variant selection).
--usnjrnl-record N $MFT record of the $UsnJrnl $J stream. Auto-discovered through $Extend when absent.
--output, -o Path to write the signed leaf JSON.
--report Path to write the HTML report.

Use detect when you already know which record to examine; use scan to let the tool find candidates.

scan

Enumerates every distinct $MFT record referenced by the parsed $LogFile and runs the full per-record pipeline on each.

python -m logflip scan (--image IMG | --volume VOL)
                       [--db DB_JSON] [--key-file KEY_FILE] [--variant VARIANT_JSON]
                       [--leaf-dir DIR] [--report-dir DIR]
                       [--include-mft-deltas] [--mft-range A:B]

$UsnJrnl is auto-discovered via $Extend; there is no --usnjrnl-record option on scan.

Flag Description
--leaf-dir DIR Directory to write leaf_<mft_record>.json per finding (created if absent; non-clean records only).
--report-dir DIR Directory to write report_<mft_record>.html per finding (created if absent; non-clean records only).
--include-mft-deltas Opt-in second pass that walks every allocated $MFT record and reports SI-vs-FN deltas without $LogFile coverage at the anomaly tier.
--mft-range A:B Bound the delta walk to a half-open MFT range (0 <= A <= B). No effect without --include-mft-deltas; malformed input exits 1.

A corrupt or unreadable record is reported as a skipped row and never aborts the scan.

On the anomaly walk. The delta pass is intentionally noisy on a real full $MFT: many legitimate files carry an SI-vs-FN delta because $FILE_NAME is updated only at create, rename, and reparent, not on content or attribute edits. Anomalies are single-source leads that require independent corroboration; they never set exit code 2 and never emit artifacts.

Example with both passes enabled (synthetic test image with two stomped records and one journal-less anomaly slot):

python -m logflip scan --image disk.img --include-mft-deltas
mft_record   verdict        tool_family
5            provisional    -
7            provisional    -
9            clean          -
12           anomaly        -
candidates: 4  findings: 2  anomalies: 1  skipped: 0
anomalies are single-source ($STANDARD_INFORMATION vs $FILE_NAME) and require corroboration; they are not confirmed findings.

Exit is 2 because of the two provisional findings; the record-12 anomaly does not affect it.

Variant selection

Both detect and scan accept --variant VARIANT_JSON, a JSON object string selecting one variant of a multi-variant signed database. The shipped win10_22h2 database carries two variants keyed by Windows build and cluster size, so --variant is required whenever --db points at it:

python -m logflip detect --image disk.img --mft-record 5 \
    --db win10_22h2.json --key-file key.bin \
    --variant '{"windows_build":"win10-22h2","cluster_size_bytes":4096}'

The two shipped variant keys are {"windows_build":"win10-22h2","cluster_size_bytes":4096} and {"windows_build":"win10-22h2","cluster_size_bytes":65536}. The flag is optional only for a flat (single-variant) database.

verify-leaf

Re-verifies a signed leaf against recovered key material: it recomputes the leaf HMAC from the supplied key and constant-time compares it against the stored MAC.

python -m logflip verify-leaf --key-file KEY_FILE LEAF_JSON

Exit 0 on PASS, 1 on FAIL or error.

verify-db

Verifies the HMAC of a signed fingerprint database over its RFC 8785 canonical form (every field except hmac). Variant-independent: it works on flat and multi-variant databases without a --variant argument.

python -m logflip verify-db --key-file KEY_FILE DB_JSON

Exit 0 on PASS, 1 on FAIL (prints db_integrity_failure) or error.

build-db

Constructs and signs a fingerprint database artifact from a capture manifest JSON file.

python -m logflip build-db --key-file KEY_FILE --output OUTPUT_JSON MANIFEST_JSON

keygen

Generates a 32-byte cryptographically secure random engagement key and writes it as raw bytes with mode 0600. The key bytes are never printed, and the command refuses to overwrite an existing file unless --force is given.

python -m logflip keygen --output KEY_FILE [--force]

The resulting file is the --key-file argument for build-db, detect, scan, verify-leaf, and verify-db. Seal it per docs/custody_sop.md immediately after generation.

ingest-captures

Reads one or more operator-supplied capture bundles (per docs/capture_spec.md), verifies SHA-256 integrity, validates manifest fields, and writes a build-db manifest.

python -m logflip ingest-captures --output OUTPUT_JSON \
    [--dirty-shutdown-fp-rates PATH] CAPTURE_DIR [CAPTURE_DIR ...]

When --dirty-shutdown-fp-rates is absent, a placeholder block with zero values is written; supply a JSON file with measured rates to embed them instead. This is the path used to build the shipped database from authenticated captures. Malformed manifests, non-UTF-8 inputs, and wrong-typed fields are rejected with a clean error, never an uncaught traceback.

Evidence integrity

Every artifact logflip emits is cryptographically bound to the engagement key:

  • Signing. Leaf and database HMAC keys are derived with HKDF-SHA256 under domain-separated salt and info constants, so the leaf key and the database key can never collide.
  • Canonicalization. Signatures are computed over the RFC 8785 (JCS) canonical form, so a byte-stable serialization is signed regardless of field ordering.
  • Verification. All MAC comparisons are constant-time. A single flipped byte in a signed database makes verify-db report db_integrity_failure; a tampered leaf fails verify-leaf.
  • Key handling. Keys live only in raw 32-byte files passed via --key-file, never on the command line. keygen writes them 0600 and never prints the bytes.

Engagement-key custody, escrow, and destruction are documented in docs/custody_sop.md.

Honest scope and limitations

This section documents the honest boundaries of what the shipped code can and cannot conclude from real Windows NTFS captures. It is written as methodology transparency consistent with Daubert reliability requirements.

Real LFS records are parsed; attribution is the gap

The reverse-replay channel parses both the synthetic LFS fixtures generated by the test suite and real Windows $LogFile journals. The forward parser (logflip/logfile/forward.py) reads the LFS record header length dynamically from the restart area (0x30 on the captured Win10 22H2 and Win8.1 builds, a 48-byte header, not a fixed value) and reassembles multi-page log records by concatenating the data area of each RCRD page before walking records, so a record that spans a page boundary is recovered rather than dropped (_iter_real_raw_records). The real-LFS path is exercised against committed real-capture fixtures by tests/test_real_lfs.py.

The residual limitation is attribution, not parsing. The parser recovers redo/undo records from real journals, but round-1 and round-2 captures yielded no stable, non-trivially-sparse, tool-specific $LogFile byte pattern. Because every shipped family therefore carries an empty pattern_hex, the matcher can never bind a byte pattern to a real capture, and the pipeline ceiling on real input is provisional: the reverse replay and the SI-vs-FN delta can surface a finding, but byte-pattern tool-family attribution is unavailable, so evil_confirmed is never reached. This is the correct never-false-confirm behavior.

SetMACE (dismount raw-write): non-attributable by design

SetMACE writes raw sectors after dismounting the volume, bypassing the NTFS log entirely. It emits zero $LogFile transactions and zero new $UsnJrnl reason flags for the target record, and it overwrites both the $STANDARD_INFORMATION (SI) and $FILE_NAME (FN) timestamp fields with the same value, deliberately erasing the SI-vs-FN delta the pipeline uses as an independent signal. SetMACE-modified records are therefore non-attributable through any of the three pipeline channels: on a real image a SetMACE-stomped file is indistinct from a legitimately old file unless out-of-band context is observed.

SI-vs-FN delta: a high-sensitivity screen with benign false anomalies

A nonzero SI-vs-FN delta is a necessary condition for many timestomping tools (those that update SI only) but not sufficient for attribution. Many legitimate files carry a nonzero delta because $FILE_NAME is updated only at create, rename, and reparent. The scan anomaly tier surfaces these as investigative leads, not findings, and the channel is gated as INCONCLUSIVE when no corroborating journal evidence exists. It never produces evil_confirmed on its own.

Byte-pattern attribution: no validated pattern yet

Round-1 and round-2 lab captures on Windows 10 22H2 yielded no stable, non-trivially-sparse byte pattern reliably specific to any tested tool family. The only injected bytes shared across runs were the fake timestamp value (2009-01-01T00:00:00Z), which would also match any legitimately 2009-dated file and is therefore an unreliable discriminator.

The shipped database (logflip/fingerprint/data/win10_22h2.manifest.json) records the structural characterization of each tool family (write method, journal behavior, SI-vs-FN relationship, cluster-size-dependent behavior) but contains no byte patterns: every family has an empty pattern_hex and a confidence of 0.0. The per-family fp_rate is 0.0 by construction (a structural zero): the empty-pattern matcher guard precludes any byte match. The database-level dirty_shutdown_fp_rates block is MEASURED (0.000 over 42 real-Windows images, a controlled-corpus methodology demonstration, not a real-world population rate; see docs/false_positive_rates.md).

Coverage window

The $LogFile and $UsnJrnl rollover blind window (roughly 4 to 12 hours on active volumes) is the primary coverage limit: events older than the live journal window yield INCONCLUSIVE.

Prior art and novelty

NTFS Log Tracker, the closest prior-art tool, reaches roughly 50 percent fidelity on the detection-signal layer: it parses $LogFile forward records and surfaces some timestamp evidence, but it does not perform the backward LSN walk required to reconstruct pre-tamper attribute state. As a result, no public tool performs reverse-replay of NTFS $LogFile to reconstruct pre-tamper state and identify the tampering tool family. That combination, walking undo records backward to recover the original SI timestamps and then attributing the tampering tool against a signed fingerprint database, is the novelty of logflip.

Project status and quality

logflip-closed is v0.1.0. The core detection engine, the signed-evidence pipeline, and the documentation are complete and verified.

Signal State
Test suite 809 passing, 18 skipped (the skips require ntfsprogs/ntfs-3g or a live NTFS volume)
Type checking mypy --strict clean across the package
Linting ruff clean
Python 3.11+

The honest-by-design posture above is a feature, not a gap: the tool is engineered never to emit a false confirmation, and that property is verified by the suite rather than asserted.

Development

python -m pip install -e ".[dev]"
python -m pytest                 # full suite
python -m mypy logflip --strict  # type gate
python -m ruff check logflip tests

Tests marked lab need a real NTFS environment (a live Windows volume with admin, or the Linux ntfsprogs/ntfs-3g toolchain) and are skipped automatically where that environment is absent.

Documentation

Document Contents
SETUP.md The two-environment (Windows capture / SIFT analysis) execution handoff.
docs/capture_spec.md The capture-bundle format consumed by ingest-captures.
docs/custody_sop.md Engagement-key generation, sealing, escrow, and destruction.
docs/false_positive_rates.md The controlled-corpus false-positive methodology and measured results.

License

No license is currently declared for this repository. Until a LICENSE file is added, all rights are reserved by the author and the code is provided for evaluation only. A license must be chosen before any public release.

Releases

No releases published

Packages

 
 
 

Contributors

Languages