fuzzamoto-libafl: Assertions as feedback by dergoegge · Pull Request #69 · oss-garage/fuzzamoto

dergoegge · 2025-11-28T16:53:42Z

No description provided.

maflcko

nice. left a comment, but feel free to ignore.

maflcko · 2025-12-01T10:50:40Z

+# Assertions
+
+Fuzzamoto implements a feedback-guided assertion system inspired by
+[Antithesis's sometimes
+assertions](https://antithesis.com/docs/best_practices/sometimes_assertions/),
+designed to both validate program correctness and guide fuzzing toward
+interesting execution states.
+
+The assertion system is only available when fuzzing with
+[`fuzzamoto-libafl`](../usage/libafl.md).


The docs you link to are a bit vague around "sometimes assertions" being able to fail. Also, in the current implementation they are not able to fail, even if the fuzzing completes?

I am thinking, it could make sense to have both:

A macro to guide the fuzz engine to mark a condition as interesting and provide the fuzz engine with some kind of distance to reach it. Maybe this could be called reachability_goal(...) (or coverage_goal(...))

A macro to ensure different interesting "leaf" coverage is hit when iterating over a pre-generated, static set of (minimized) fuzz inputs in one folder. Maybe this could be called reachability_assert(...) (or coverage_assert(...)).

Such "reachability asserts" could help to more reliably trigger major coverage drops (google/oss-fuzz#11398). Not sure if this is in-scope for fuzzamoto, but I wanted to mention it, because it seems interesting in the greater fuzzing scope.

The docs you link to are a bit vague around "sometimes assertions" being able to fail.

The way I understand it, failure for these assertions is defined as "over all tests we ran, the condition was never true", so in the context here, the following assertion would fail if none of the generated testcases ever managed to submit a valid transaction to the mempool.

assert_sometimes!(cond: mempool_size > 0, "Mempool is not empty");

This means that a failed assertion might flip to being satisfied over time as the fuzzing potentially discovers the right input. And in the other direction, a satisfied assertion might flip into failure if for some reason the coverage/state can no longer be reached.

Also, in the current implementation they are not able to fail, even if the fuzzing completes?

Yea in my implementation the sometimes assertions don't fail in a normal sense, but their status for a specific campaign can be inspected by looking at the assertion.txt file that is generated, e.g.:

✗ Sometimes gt(127, 1000): Mempool may contain more than 1000 txs ✓ Always cond(true): One active tip must exist ✓ Always lt(960096, 5000000): Mempool usage does not exceed the maximum ✓ Always lte(525000000000, 2100000000000000): Coin supply is within expected limits ✓ Sometimes cond(true): Block tip may change ✓ Sometimes cond(true): Mempool is not empty ✓ Sometimes cond(true): Node under test should send addr messages

I think I'll export the assertion "results" in a machine readable way, such that surrounding infrastructure can keep track and report issues based on them (e.g. coverage drops). Maybe also add some stats about the assertions to the stdout fuzzer output.

Such "reachability asserts" could help to more reliably trigger

I'll think about your suggested macros but I feel like the macros I have now could already satisfy those needs?

I think I'll export the assertion "results" in a machine readable way, such that surrounding infrastructure can keep track and report issues based on them (e.g. coverage drops). Maybe also add some stats about the assertions to the stdout fuzzer output.

Such "reachability asserts" could help to more reliably trigger

I'll think about your suggested macros but I feel like the macros I have now could already satisfy those needs?

Yes, if your macro is designed so that surrounding infra decides how to treat a coverage miss, then both needs (explicitly assert on coverage, or just set a silent coverage goal) are satisfied. The only difference is that one approach puts the treatment of coverage misses in the source code directly, the other approach puts the treatment in the surrounding infra.

I guess a third option could also be to keep a single macro and set an env var to denote how the macro should behave: COVERAGE_ASSERT=0/1.

But all of this can trivially be implemented later and any approach looks good to me.

Crypt-iQ

Reviewed the code and it's clear and makes sense to me, will run. Left some clarifying questions.

Crypt-iQ · 2026-01-27T15:59:28Z

+                ("Sometimes", detail, msg)
+            }
+            AssertionScope::Always(inner, msg) => {
+                fires = !fires;


wondering why this gets inverted? Is it because it was already inverted when assertion.evaluate calls distance?

Crypt-iQ · 2026-01-27T21:19:25Z

+        let previous = self.assertions.get(&new.message());
+
+        let result = match (previous, &new) {
+            (None, new) => new.evaluate() || !self.only_always_assertions,


why is !self.only_always_assertions needed? Is it so that a Sometimes entry can be put into self.assertions and then later evaluate_assertion calls get closer in distance?

Crypt-iQ · 2026-01-28T17:40:42Z

@@ -1,3 +1,4 @@
+#include <assert.h>


curious why this is needed?

dergoegge · 2026-02-13T16:52:27Z

Just rebased, haven't addressed review feedback yet

Crypt-iQ · 2026-04-08T18:53:59Z

                ConstFeedback::new(self.options.minimize_input.is_none()),
-                map_feedback
+                // Every 5th instance (skipping 0) has coverage feedback disabled
+                ConstFeedback::new(self.client_description.core_id().0 % 5 != 1),


Because this is in a feedback_or which evaluates all arguments and does not short-circuit, the AssertionFeedback still calls is_interesting for all cores. If core 0 finds an input that also triggered "better" assertions, the minimization step will ensure that the assertions are maintained.

Also, the map_feedback may not find an input interesting which then makes the AssertionFeedback run for cores where the feedback should be disabled.

Crypt-iQ · 2026-04-13T19:27:38Z

I noticed that even with this PR (and even with many of my own tweaks), the mempool size always plateaued between 100 and 200. I took a benchmark of this PR compared to a baseline (this PR with the AssertionFeedback code deleted) for 12 hours on 12 cores and got the following graph:

The graph matches what I expected (and have observed without numbers in other runs) that assertion feedback 1. does help reach larger mempool sizes, and 2. a plateau definitely occurs. When assertion feedback is enabled, the fuzzer populates the mempool quickly and then has trouble populating more as time goes on. When the feedback is disabled, the fuzzer takes longer to populate the mempool, the mempool is smaller overall, and it also gets more difficult to populate as time goes on. I tried several approaches to improve this (e.g. having a custom scheduler that prioritized assertion improvements even more, bumping the transaction generator weights), but to no avail. My hunch is that the plateau occurs because of a lack of feedback on whether a transaction is accepted into the mempool or not and also whether a conflicting transaction(s) has now become invalid. I ran #118 with assertion feedback and with the -debug=validation, -debug=mempool, and -debug=mempoolrej log categories. I found that most of the errors for rejecting a transaction were due to bad-txns-inputs-missingorspent. This suggests that transactions had parents which were replaced or were never valid.

Some things I noticed which contribute to the problem:

choose_index for the transaction generators may invalidate a chain of transactions if inserted in the IR at a point before the chain is created. This also lets us test the RBF logic so it is both good and bad.
get_random_utxos may consume a lot of UTXOs. If the transaction is then invalid, all of the UTXOs are marked as spent and a subsequent call to get_random_utxos won't return them. The exception being that choose_index for a generator may return a point before these UTXOs get consumed which allows them to be used.
An invalid transaction's UTXOs will be marked as available despite them being unusable.
(minor) TxoGenerator must run before the UTXOs from the context are usable and it only loads one at a time. I think it would be more efficient if it instead loaded all of the UTXOs at once.

Ultimately though, the issue imo is not the four things listed above but instead a lack of feedback on whether the fuzzer is making valid transactions. I'm not sure what that feedback would look like and how complex it would be to track validity and invalidity. This is tangentially related to this PR since the assertion feedback is useful and clearly improves coverage, but in the mempool case it is kind of hampered by other factors that I didn't expect.

dergoegge · 2026-05-14T13:20:03Z

I found that most of the errors for rejecting a transaction were due to bad-txns-inputs-missingorspent. This suggests that transactions had parents which were replaced or were never valid.

I think #87 was also trying to address this. I.e. use probing to make smarter mutations wrt extending the mempool

maflcko reviewed Dec 1, 2025

View reviewed changes

Crypt-iQ reviewed Jan 28, 2026

View reviewed changes

Crypt-iQ mentioned this pull request Feb 12, 2026

add support for incremental snapshots #103

Open

dergoegge force-pushed the assertions branch from 59416e4 to 5ed436f Compare February 13, 2026 16:51

dergoegge force-pushed the assertions branch 2 times, most recently from 8a91e66 to 7503930 Compare February 13, 2026 17:05

Crypt-iQ reviewed Apr 8, 2026

View reviewed changes

dergoegge force-pushed the assertions branch 2 times, most recently from c398b60 to 6d2477c Compare May 14, 2026 13:02

fuzzamoto-libafl: Add assertion-guided feedback

44db98c

dergoegge force-pushed the assertions branch from 6d2477c to 44db98c Compare May 14, 2026 13:03

Uh oh!

Conversation

dergoegge commented Nov 28, 2025

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

maflcko Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dergoegge Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

maflcko Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Crypt-iQ left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Crypt-iQ Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Crypt-iQ Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Crypt-iQ Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dergoegge commented Feb 13, 2026

Uh oh!

Crypt-iQ Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Crypt-iQ commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dergoegge commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maflcko Dec 1, 2025 •

edited

Loading

Crypt-iQ commented Apr 13, 2026 •

edited

Loading