fuzzamoto-libafl: Assertions as feedback#69
Conversation
maflcko
left a comment
There was a problem hiding this comment.
nice. left a comment, but feel free to ignore.
| # Assertions | ||
|
|
||
| Fuzzamoto implements a feedback-guided assertion system inspired by | ||
| [Antithesis's sometimes | ||
| assertions](https://antithesis.com/docs/best_practices/sometimes_assertions/), | ||
| designed to both validate program correctness and guide fuzzing toward | ||
| interesting execution states. | ||
|
|
||
| The assertion system is only available when fuzzing with | ||
| [`fuzzamoto-libafl`](../usage/libafl.md). |
There was a problem hiding this comment.
The docs you link to are a bit vague around "sometimes assertions" being able to fail. Also, in the current implementation they are not able to fail, even if the fuzzing completes?
I am thinking, it could make sense to have both:
- A macro to guide the fuzz engine to mark a condition as interesting and provide the fuzz engine with some kind of distance to reach it. Maybe this could be called
reachability_goal(...)(orcoverage_goal(...)) - A macro to ensure different interesting "leaf" coverage is hit when iterating over a pre-generated, static set of (minimized) fuzz inputs in one folder. Maybe this could be called
reachability_assert(...)(orcoverage_assert(...)).
Such "reachability asserts" could help to more reliably trigger major coverage drops (google/oss-fuzz#11398). Not sure if this is in-scope for fuzzamoto, but I wanted to mention it, because it seems interesting in the greater fuzzing scope.
There was a problem hiding this comment.
The docs you link to are a bit vague around "sometimes assertions" being able to fail.
The way I understand it, failure for these assertions is defined as "over all tests we ran, the condition was never true", so in the context here, the following assertion would fail if none of the generated testcases ever managed to submit a valid transaction to the mempool.
assert_sometimes!(cond: mempool_size > 0, "Mempool is not empty");This means that a failed assertion might flip to being satisfied over time as the fuzzing potentially discovers the right input. And in the other direction, a satisfied assertion might flip into failure if for some reason the coverage/state can no longer be reached.
Also, in the current implementation they are not able to fail, even if the fuzzing completes?
Yea in my implementation the sometimes assertions don't fail in a normal sense, but their status for a specific campaign can be inspected by looking at the assertion.txt file that is generated, e.g.:
✗ Sometimes gt(127, 1000): Mempool may contain more than 1000 txs
✓ Always cond(true): One active tip must exist
✓ Always lt(960096, 5000000): Mempool usage does not exceed the maximum
✓ Always lte(525000000000, 2100000000000000): Coin supply is within expected limits
✓ Sometimes cond(true): Block tip may change
✓ Sometimes cond(true): Mempool is not empty
✓ Sometimes cond(true): Node under test should send addr messages
I think I'll export the assertion "results" in a machine readable way, such that surrounding infrastructure can keep track and report issues based on them (e.g. coverage drops). Maybe also add some stats about the assertions to the stdout fuzzer output.
Such "reachability asserts" could help to more reliably trigger
I'll think about your suggested macros but I feel like the macros I have now could already satisfy those needs?
There was a problem hiding this comment.
I think I'll export the assertion "results" in a machine readable way, such that surrounding infrastructure can keep track and report issues based on them (e.g. coverage drops). Maybe also add some stats about the assertions to the stdout fuzzer output.
Such "reachability asserts" could help to more reliably trigger
I'll think about your suggested macros but I feel like the macros I have now could already satisfy those needs?
Yes, if your macro is designed so that surrounding infra decides how to treat a coverage miss, then both needs (explicitly assert on coverage, or just set a silent coverage goal) are satisfied. The only difference is that one approach puts the treatment of coverage misses in the source code directly, the other approach puts the treatment in the surrounding infra.
I guess a third option could also be to keep a single macro and set an env var to denote how the macro should behave: COVERAGE_ASSERT=0/1.
But all of this can trivially be implemented later and any approach looks good to me.
Crypt-iQ
left a comment
There was a problem hiding this comment.
Reviewed the code and it's clear and makes sense to me, will run. Left some clarifying questions.
| ("Sometimes", detail, msg) | ||
| } | ||
| AssertionScope::Always(inner, msg) => { | ||
| fires = !fires; |
There was a problem hiding this comment.
wondering why this gets inverted? Is it because it was already inverted when assertion.evaluate calls distance?
| let previous = self.assertions.get(&new.message()); | ||
|
|
||
| let result = match (previous, &new) { | ||
| (None, new) => new.evaluate() || !self.only_always_assertions, |
There was a problem hiding this comment.
why is !self.only_always_assertions needed? Is it so that a Sometimes entry can be put into self.assertions and then later evaluate_assertion calls get closer in distance?
| @@ -1,3 +1,4 @@ | |||
| #include <assert.h> | |||
There was a problem hiding this comment.
curious why this is needed?
59416e4 to
5ed436f
Compare
|
Just rebased, haven't addressed review feedback yet |
8a91e66 to
7503930
Compare
| ConstFeedback::new(self.options.minimize_input.is_none()), | ||
| map_feedback | ||
| // Every 5th instance (skipping 0) has coverage feedback disabled | ||
| ConstFeedback::new(self.client_description.core_id().0 % 5 != 1), |
There was a problem hiding this comment.
Because this is in a feedback_or which evaluates all arguments and does not short-circuit, the AssertionFeedback still calls is_interesting for all cores. If core 0 finds an input that also triggered "better" assertions, the minimization step will ensure that the assertions are maintained.
Also, the map_feedback may not find an input interesting which then makes the AssertionFeedback run for cores where the feedback should be disabled.
|
I noticed that even with this PR (and even with many of my own tweaks), the mempool size always plateaued between 100 and 200. I took a benchmark of this PR compared to a baseline (this PR with the The graph matches what I expected (and have observed without numbers in other runs) that assertion feedback 1. does help reach larger mempool sizes, and 2. a plateau definitely occurs. When assertion feedback is enabled, the fuzzer populates the mempool quickly and then has trouble populating more as time goes on. When the feedback is disabled, the fuzzer takes longer to populate the mempool, the mempool is smaller overall, and it also gets more difficult to populate as time goes on. I tried several approaches to improve this (e.g. having a custom scheduler that prioritized assertion improvements even more, bumping the transaction generator weights), but to no avail. My hunch is that the plateau occurs because of a lack of feedback on whether a transaction is accepted into the mempool or not and also whether a conflicting transaction(s) has now become invalid. I ran #118 with assertion feedback and with the Some things I noticed which contribute to the problem:
Ultimately though, the issue imo is not the four things listed above but instead a lack of feedback on whether the fuzzer is making valid transactions. I'm not sure what that feedback would look like and how complex it would be to track validity and invalidity. This is tangentially related to this PR since the assertion feedback is useful and clearly improves coverage, but in the mempool case it is kind of hampered by other factors that I didn't expect. |
c398b60 to
6d2477c
Compare
I think #87 was also trying to address this. I.e. use probing to make smarter mutations wrt extending the mempool |

No description provided.