Home < Bitcoin Core Dev Tech < Bitcoin Core Dev Tech 2023 (Apr) < Fuzzing

Fuzzing

Speakers: Niklas Gögge

Date: April 27, 2023

Transcript By:

Tags: Bitcoin core, Developer tools

Category: Core dev tech

Slides: https://docs.google.com/presentation/d/1NlTw_n60z9bvqziZqU3H3Jw7Xs5slnQoehYXhEKrzOE

Fuzzing

  • Fuzzing is done continuously. Fuzz targets can pay off even years later by finding newly introduced bugs.
  • Example in slide about libFuzzer fuzzing a parse_json function which might crash on some weird input but won’t report invalid json inputs that pass parsing. libFuzzer does coverage guided feedback loop + helps with exploring control flow.

Bug Oracles

  • Assertions - Adding assertions is tricky for network code. We add Assume() when continuing is not worse than crashing. Assert() will crash in production. Assume() will crash in debug. Throwing all kinds of Assume() in the code is fine but it slows down production. Place assertions in fuzz target.
  • Resource Limit - Ex: Don’t take more than 10 mb or 5 sec
  • Sanitizers - examples include undefined behaviour, thread, leak, memory, address sanitizers. They can add a performance or memory overhead. Everything except memory sanitizer is easy to use. Recommendation to not use multiple sanitizers at once.
  • Function inverse pairs - encode/decode
  • Differential fuzzing - fuzz current implementation against simpler implementation. Done in coinscache and txrequest.
  • Null space transformation - only make mutations that preserve semantics
  • Domain specific checks - e.g. bitcoin block under soft fork rule should be valid when no soft fork enforced too

Best practices for targets

  • Avoid non-bug crashes - translate unit tests to fuzz tests (for wallet)
  • Verify coverage - you need to combine information about what function is supposed to do + knowledge about testing. write target + verify that it reaches code you want to test. Verify coverage using coverage statistics, assert False.
  • Determinism - to identify bugs which naive fuzz test didn’t crash so that they’re reproducible. Ideally include fuzz tests in your PR and discriminate between different fuzz targets. Input to fuzzer is not random, it’s just the best possible input. If there is bug, avoid actual randomness - use fixed seeds to mock out randomness.
  • Performance - you spend compute as long as coverage is increasing and if you are maxing out you have to explicitly stop fuzz targets. Fuzzing with sanitisers sometimes gives additional coverage. Sanitisers generally slow you down though. So run corpus with sanitisers as an extra sanity check later. Reset global state at end of each iteration + try to avoid global state. Avoid expensive I/O. Sometimes time is wasted on less interesting input.
  • Keep scope of target small - Think about what you want to test and focus the fuzzer on that part of the code. Split larger APIs into multiple targets if necessary.
  • Read fuzzing docs
  • Think of writing fuzz targets as if you were writing audits like unit tests, functional tests.

Net processing

  • integration fuzzing? how to address interaction between components using fuzz target/improve some of our integration based fuzzing.
  • When fuzzing net processing, you don’t really care about ping pong going in at same time or validation logic. transaction relay for example. fuzzer gets confused when there is too much information and does lot of unnecessary mutations.
  • ProcessMessage() is 1 giant switch statement for the most part - fuzzing for each individual message is included. libFuzzer is not able to produce valid transactions, blocks, headers due to PoW, signatures, hashes etc.. we could/should mock out validation to enable better fuzzing of net processing. Boundary testing - not really fuzzing - different technique which is fine to have to new technique to test interactions? You still need to throw wrappers around functions.
  • txrequest, txorphan, headerssync - encapsulate into own modules and test them separately. We want to fuzz net processing in isolation - by doing more refactoring/mocks. some work already but more fixes needed.

Input splitting

  • input splitting - split bytestream into valid c++ data structures.
  • FuzzDataProvider - utility to split fuzz inputs into c++ data structures. For example, if you want to fuzz block processing, use FuzzDataProvider to parse fuzz inputs into valid blocks. we could write code for producing valid semantics but too much code on test side

Net Processing

  • see https://github.com/bitcoin/bitcoin/issues/27502
  • made an issue with all the stuff I want to do if we basically want to fuzz net processing in isolation and mock out the networking side.
  • Position to motivate all refactoring - we don’t hate refactoring. We can turn these red things in net processing into green so that these code regions are covered. That’d be pretty good. This code has found some bugs already.
  • Performance - currently these targets create 1 global instance of chainstate manager etc. but if you want to make the target deterministic then you need to create a new instance of chainstate manager in each iteration and that process currently is very slow. I made a target and it took 10 - 50 exec/sec. In the future, in-memory block storage can be used in tests for better performance.
  • Module separation - I think the module separation between net and net processing has come along pretty well. Almost done - still some work left to do. reviews appreciated. fuzzing seems to be a great motivation for refactoring.
  • Net processing/validation split: Lots of overlap with kernel - that’s going to be a lot of work and hard - haven’t mapped out all things that needs to be done.
  • Speaking of net processing refactors - bug was found in ProcessMessage() module. if only motivation is refactoring maybe we shouldn’t. Fuzzing finds bugs - so important too.
  • Refactoring changes need to be accompanied with fuzzing because refactoring often produces bugs. We don’t have a fuzz target that specifically fuzzed version handshake and the branch I have does it and some additional testing for the same. The reason we don’t have version handshake fuzzing is because for peer’s ProcessMessage() to work, they need to be initialised.
  • Suggestion to have a document which explains all interfaces introduced when refactoring.

Bitcoin Core’s Fuzzing Infrastructure

  • Contributors run their own fuzzing infra (Marco) + OSS-Fuzz (cluster fuzz instance managed by google)
  • Google donating CPU to open source projects for fuzzing
  1. CPU per target per sanitizer per fuzz engine how to contribute inputs back to qa assets? check --generate option.
  2. ways to run fuzz test - using empty corpus or corpus from qa-assets repo You clone corpus from qa-assets locally - thing is it gets really big, less than a blockchain :) Github limits qa-asset repo size is a possibility, not a concern currently.
  • At branch off time, qa-assets repo is updated. new coverage corpus could be 10-15 GB.
  • Sometimes good to start with an empty corpus which could have additional coverage. working on some code that has a fuzzer associated with it - i’ll run that fuzzer sometimes with an empty corpus strategically - gut feeling - i don’t know good way.
  • Do we delete inputs from the corpus? yes, at branch off.
  • CI fuzz is also done- run on existing seeds for 2 min.
  • Run our own cluster fuzzing instance? not so easy + extra maintenance.
  • Reliance on Google fuzzing infra? we have access to OSS fuzz’s corpus. What if Google kicks us off/not tell bugs. they’ve always told bugs in past incidents. Recently there was an overflow bug in the miniscript’s logic. At that time, i was fuzzing on a 30 core. During that same 24 hour time window, OSS fuzz revealed it too - so an indication that they were honest first.
  • There is a 90 day bug disclosure deadline set by OSS-Fuzz but bitcoin core gets an exception. Never needed and unlikely we find severe bug. So far, we usually find bad bugs while writing new targets and running them on our own infra.

What next

  • Contribute to corpus in qa assets repo which adds additional coverage
  • Reviewing PRs
  • Write fuzz tests