Skip to content

fuzzamoto-libafl: Assertions as feedback#69

Draft
dergoegge wants to merge 1 commit into
masterfrom
assertions
Draft

fuzzamoto-libafl: Assertions as feedback#69
dergoegge wants to merge 1 commit into
masterfrom
assertions

Conversation

@dergoegge

Copy link
Copy Markdown
Member

No description provided.

@maflcko maflcko left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. left a comment, but feel free to ignore.

Comment on lines +1 to +10
# Assertions

Fuzzamoto implements a feedback-guided assertion system inspired by
[Antithesis's sometimes
assertions](https://antithesis.com/docs/best_practices/sometimes_assertions/),
designed to both validate program correctness and guide fuzzing toward
interesting execution states.

The assertion system is only available when fuzzing with
[`fuzzamoto-libafl`](../usage/libafl.md).

@maflcko maflcko Dec 1, 2025

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs you link to are a bit vague around "sometimes assertions" being able to fail. Also, in the current implementation they are not able to fail, even if the fuzzing completes?

I am thinking, it could make sense to have both:

  • A macro to guide the fuzz engine to mark a condition as interesting and provide the fuzz engine with some kind of distance to reach it. Maybe this could be called reachability_goal(...) (or coverage_goal(...))
  • A macro to ensure different interesting "leaf" coverage is hit when iterating over a pre-generated, static set of (minimized) fuzz inputs in one folder. Maybe this could be called reachability_assert(...) (or coverage_assert(...)).

Such "reachability asserts" could help to more reliably trigger major coverage drops (google/oss-fuzz#11398). Not sure if this is in-scope for fuzzamoto, but I wanted to mention it, because it seems interesting in the greater fuzzing scope.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs you link to are a bit vague around "sometimes assertions" being able to fail.

The way I understand it, failure for these assertions is defined as "over all tests we ran, the condition was never true", so in the context here, the following assertion would fail if none of the generated testcases ever managed to submit a valid transaction to the mempool.

assert_sometimes!(cond: mempool_size > 0, "Mempool is not empty");

This means that a failed assertion might flip to being satisfied over time as the fuzzing potentially discovers the right input. And in the other direction, a satisfied assertion might flip into failure if for some reason the coverage/state can no longer be reached.

Also, in the current implementation they are not able to fail, even if the fuzzing completes?

Yea in my implementation the sometimes assertions don't fail in a normal sense, but their status for a specific campaign can be inspected by looking at the assertion.txt file that is generated, e.g.:

✗ Sometimes gt(127, 1000): Mempool may contain more than 1000 txs
✓ Always cond(true): One active tip must exist
✓ Always lt(960096, 5000000): Mempool usage does not exceed the maximum
✓ Always lte(525000000000, 2100000000000000): Coin supply is within expected limits
✓ Sometimes cond(true): Block tip may change
✓ Sometimes cond(true): Mempool is not empty
✓ Sometimes cond(true): Node under test should send addr messages

I think I'll export the assertion "results" in a machine readable way, such that surrounding infrastructure can keep track and report issues based on them (e.g. coverage drops). Maybe also add some stats about the assertions to the stdout fuzzer output.

Such "reachability asserts" could help to more reliably trigger

I'll think about your suggested macros but I feel like the macros I have now could already satisfy those needs?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll export the assertion "results" in a machine readable way, such that surrounding infrastructure can keep track and report issues based on them (e.g. coverage drops). Maybe also add some stats about the assertions to the stdout fuzzer output.

Such "reachability asserts" could help to more reliably trigger

I'll think about your suggested macros but I feel like the macros I have now could already satisfy those needs?

Yes, if your macro is designed so that surrounding infra decides how to treat a coverage miss, then both needs (explicitly assert on coverage, or just set a silent coverage goal) are satisfied. The only difference is that one approach puts the treatment of coverage misses in the source code directly, the other approach puts the treatment in the surrounding infra.

I guess a third option could also be to keep a single macro and set an env var to denote how the macro should behave: COVERAGE_ASSERT=0/1.

But all of this can trivially be implemented later and any approach looks good to me.

@Crypt-iQ Crypt-iQ left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the code and it's clear and makes sense to me, will run. Left some clarifying questions.

Comment thread fuzzamoto/src/assertions.rs Outdated
Comment thread fuzzamoto/src/assertions.rs Outdated
("Sometimes", detail, msg)
}
AssertionScope::Always(inner, msg) => {
fires = !fires;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering why this gets inverted? Is it because it was already inverted when assertion.evaluate calls distance?

let previous = self.assertions.get(&new.message());

let result = match (previous, &new) {
(None, new) => new.evaluate() || !self.only_always_assertions,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is !self.only_always_assertions needed? Is it so that a Sometimes entry can be put into self.assertions and then later evaluate_assertion calls get closer in distance?

Comment thread fuzzamoto-libafl/src/stages/mod.rs Outdated
@@ -1,3 +1,4 @@
#include <assert.h>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this is needed?

Comment thread fuzzamoto/src/assertions.rs Outdated
Comment thread doc/src/design/assertions.md Outdated
Comment thread doc/src/design/assertions.md
@dergoegge

Copy link
Copy Markdown
Member Author

Just rebased, haven't addressed review feedback yet

@dergoegge dergoegge force-pushed the assertions branch 2 times, most recently from 8a91e66 to 7503930 Compare February 13, 2026 17:05
Comment thread fuzzamoto-libafl/src/instance.rs Outdated
ConstFeedback::new(self.options.minimize_input.is_none()),
map_feedback
// Every 5th instance (skipping 0) has coverage feedback disabled
ConstFeedback::new(self.client_description.core_id().0 % 5 != 1),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is in a feedback_or which evaluates all arguments and does not short-circuit, the AssertionFeedback still calls is_interesting for all cores. If core 0 finds an input that also triggered "better" assertions, the minimization step will ensure that the assertions are maintained.

Also, the map_feedback may not find an input interesting which then makes the AssertionFeedback run for cores where the feedback should be disabled.

@Crypt-iQ

Crypt-iQ commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

I noticed that even with this PR (and even with many of my own tweaks), the mempool size always plateaued between 100 and 200. I took a benchmark of this PR compared to a baseline (this PR with the AssertionFeedback code deleted) for 12 hours on 12 cores and got the following graph:
fuzzamoto-assertion-plot-12h-12c

The graph matches what I expected (and have observed without numbers in other runs) that assertion feedback 1. does help reach larger mempool sizes, and 2. a plateau definitely occurs. When assertion feedback is enabled, the fuzzer populates the mempool quickly and then has trouble populating more as time goes on. When the feedback is disabled, the fuzzer takes longer to populate the mempool, the mempool is smaller overall, and it also gets more difficult to populate as time goes on. I tried several approaches to improve this (e.g. having a custom scheduler that prioritized assertion improvements even more, bumping the transaction generator weights), but to no avail. My hunch is that the plateau occurs because of a lack of feedback on whether a transaction is accepted into the mempool or not and also whether a conflicting transaction(s) has now become invalid. I ran #118 with assertion feedback and with the -debug=validation, -debug=mempool, and -debug=mempoolrej log categories. I found that most of the errors for rejecting a transaction were due to bad-txns-inputs-missingorspent. This suggests that transactions had parents which were replaced or were never valid.

Some things I noticed which contribute to the problem:

  • choose_index for the transaction generators may invalidate a chain of transactions if inserted in the IR at a point before the chain is created. This also lets us test the RBF logic so it is both good and bad.
  • get_random_utxos may consume a lot of UTXOs. If the transaction is then invalid, all of the UTXOs are marked as spent and a subsequent call to get_random_utxos won't return them. The exception being that choose_index for a generator may return a point before these UTXOs get consumed which allows them to be used.
  • An invalid transaction's UTXOs will be marked as available despite them being unusable.
  • (minor) TxoGenerator must run before the UTXOs from the context are usable and it only loads one at a time. I think it would be more efficient if it instead loaded all of the UTXOs at once.

Ultimately though, the issue imo is not the four things listed above but instead a lack of feedback on whether the fuzzer is making valid transactions. I'm not sure what that feedback would look like and how complex it would be to track validity and invalidity. This is tangentially related to this PR since the assertion feedback is useful and clearly improves coverage, but in the mempool case it is kind of hampered by other factors that I didn't expect.

@dergoegge dergoegge force-pushed the assertions branch 2 times, most recently from c398b60 to 6d2477c Compare May 14, 2026 13:02
@dergoegge

Copy link
Copy Markdown
Member Author

I found that most of the errors for rejecting a transaction were due to bad-txns-inputs-missingorspent. This suggests that transactions had parents which were replaced or were never valid.

I think #87 was also trying to address this. I.e. use probing to make smarter mutations wrt extending the mempool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants