Bitcoin Toolchain Unit Testing And Deterministic Builds
Speakers: Marco Falke
Transcript By: Bryan Bishop
Just to continue on what James said about the build system on Bitcoin Core… I am going to talk about deterministic builds. I am MarcoFalke and I also work at Chaincode Labs in NYC.
Bitcoin Core Build System
The build system is based on autotools, so it should just work anywhere where autotools runs. Just run ./autogen.sh ./configure and then make, that’s it.
We recently added support for MSVC builds mostly for Windows developers to do native builds not having to switch between Linux and Windows all the time.
For release builds, we use cross-compilation which is currently only supported on Ubuntu.
Bitcoin Core Modules and Targets
To look at all the modules and targets that our build system supports, so James was talking about regions, but in the build system they are called modules. There’s some basic modules like util which takes care of logging and random number generation. Or cryptographic primitives which provide hash functions. We have bitcoin-cli which is one build target to provide a utility command line interface to speak with a Bitcoin server to ask for maybe the chain state, mempool state, wallet state. This one depends on the basic modules. We do have some libraries included directly. For convenience, we include univalue which is not commonly distributed on major distributions. We include it for convenience. We also include the elliptic curve library, libsecp256k1, and leveldb, because they are consensus-critical. We also have dependencies on system libraries, for example openssl which we use only the random number generator and we use boost for advanced C++ language features.
Another build target is libbitcoin-consensus. This is a long term project. It was designed to provide just a library that takes care of all the consensus that is going on. You could, for example, have your node, for some reason, consensus changes- maybe there’s a soft-fork or hard-fork then you only update your library and you’re running the latest consensus but you don’t have to upgrade your whole company infrastructure. For example you don’t have to update your RPC interface etc. But right now, libbitcoin-consensus is really minimal and it can’t deal with anything that is stateful. It can do some stateless checks such as simple transaction verification.
We have another build target called bitcoin-tx which is a utility to create and modify bitcoin transactions. And finally, all the other targets including bitcoind, bitcoin-qt, and test_bitcoin. They all pretty much depend on everything. bitcoind, the daemon that runs in the background, and bitcoin-qt is the GUI, and test_bitcoin is a ton of test binaries.
Bitcoin Core Testing
A bit about testing, I’m not going to explain testing in general. There are obvious advantages to testing. For Bitcoin Core the most important reason in my opinion is that we need to have some harness to check that consensus rules do not change accidentally. There are other advantages such as the design feedback loop when you write a new feature gives you some feedback or a hint on how to document that. Finally the nice warm and fuzzy feeling when travis-ci is green. Travis is the continuous integration we use on the GitHub project. When Travis is green it is pretty much meaningless, I will explain later what that means.
To jump into some testing issues. If you look at the coverage we currently have. We have pretty good line coverage and function coverage which is greater than 80%. But then branch coverage is around 50% and then path coverage I couldn’t get a number on but I assume it is something like 0%. That is just because when you have a really large program, a lot of lines of code, a lot of branches, you can never get full path coverage. It is almost impossible. For example, look at this two line program. Let’s assume it is the only function. You call this program once then you have 100% function coverage and you also have 100% line coverage. But for branch coverage it depends on what you pass in for those booleans. You could run it once then you have 50% branch coverage. For path coverage it is even worse, you would have to run it four times to cover all of the possible paths. Let’s say first you run A and then X. You could also run A and Y or B and X or B and Y. Adding another line in this program in the same fashion makes it 8 paths and it is going up exponentially. It is impossible to test all of these cases.
Even if we had full coverage Dijkstra said you can only ever show the presence of bugs, you can never prove that there are no bugs. In addition to tests we need some sort of other tools or techniques to prevent accidental changes to the consensus behavior. What we do is be reluctant in general towards changes. Whenever there is something that seems unnecessary just avoid making the change. If there is something that has some clear motivation it needs to go through a thorough review process. It is not really clear what thorough review means. It is not clear how many people should look at it. Even if people looked at it and acknowledged the change it is not clear when enough people have looked at it. When no one has found an issue it doesn’t mean there is no issue. In the long term there should be hopefully a formal verification tool that generally works and can prove the consensus rules at least didn’t change. This is mostly a research topic, as far as I’m aware there is no practical tool to achieve this today. We do have some tools such as a scripted-diff on the source code which was contributed by Cory Fields. Instead of making a huge change over all source files you only provide the essential change in the form of a script. Then you run this script and the script defines how the commit looks. Another way is build-for-compare which was contributed by Wladimir van der Laan which basically does an object dump before the change and then compiles the executables. It does an object dump after the changes. You can compare what changed in the object dump and figure out if it was wanted. If nothing changed, which is usually great, you can only be certain that nothing changed for that particular compiler version you used, that particular architecture you run the compiler on. It is not exhaustive.
To continue on this thought, what is helpful for proving that a change didn’t change the binaries too much is deterministic builds. There’s a great website with all the information on it.
Just to give one sentence of it. Deterministic builds provide an “independently-verifiable path from source code to the binary code.” To give a really short advantages and comparison. Let’s take for example plain old builds. You want to compare for build verification purposes. For plain old builds there was no general way to do this, you’d have to manually compare disassembly. Whereas with deterministic builds you could do this automatically and check for bitwise identity. For example you could have a build server run the build script and then check…. Let’s say you want to compare how the Ubuntu PPA packages, when they are compiled they are compiled without additional backdoors. You could run in addition to the Ubuntu build server farm you could run your own farm, build all the packages and then afterward check for identity. Maybe even sign off on these changes, have your build server sign off if there is no difference. With conventional builds they can only meaningfully be signed by the release maintainer or the release build server if it is an automatic script. Whereas with deterministic builds these can be signed by anyone and by any builder. You could have for example this build server or you build it on your own laptop, your own machine and then compare the results and sign the result. If some of your friends want to run Bitcoin Core and they don’t want to trust some random website on the internet that they provide the correct binaries or maybe they don’t want to trust the Ubuntu PPA or the Ubuntu build server to compile the right Bitcoin code without any backdoors or issues. You could ask some friends and if they signed the binaries, if they did these deterministic builds and signed them, you could download them from anywhere, compare them and check the signatures. Or ideally you just go ahead and do all the building yourself and also compare with your friends.
Some common issues we found with deterministic builds or reasons for unclean builds. The most common one is probably timestamps, when you run the compiler it is going to embed the timestamp of the current system somewhere in the executable. These are often easy to work around by just setting a constant timestamp. Other issues happen when the system you’re building on has different tools for different libraries to build. For example, Python version 3 has a different implementation of the dictionary and then sorts the keys in the dictionary in a different order. So suddenly some parts of the resulting executable are differently sorted and you get a different result. Of course different compiler versions will result in non-deterministic builds because different compiler versions, maybe they only embed the version of the compiler in the executable and then it is going to differ in that way. Or they have some different behavior to optimize the code and then you get differently sized binaries that are also going to be non-deterministic. There are some fun things like number of CPU cores you use to compile which could happen for example if there are different modules and one is going to finish earlier and it is going to be written earlier to disk. Or in some way it is differently sorted. What is awesome for tracking down issues is this tool called diffoscope which I think was originally written by the Debian folks who also work on deterministic builds. You can install it I think on pretty much any Linux system other than Fedora. It is on the Python package manager and you can install additional dependencies with your own package manager. I find that really helpful.
To give an overview on how deterministic builds look like today. To remind, it is this path from source code to binary code. As input you have digital source code and some sort of descriptor how to run the build. These need to be exactly the same for all the builders. Then our Gitian build environment is currently Ubuntu 18.04. You can run it and in the first step it is going to produce the executables for all the architectures, for Windows, MacOS, Linux architectures such as ARM and a tonne more. I think it takes on a single core to run this more than a day. Then we have another step. Windows and Mac require that these binaries that you run on the system are somehow signed with a key that is also signed off by Windows or Apple in some way. What we do is in the second step is create some detached signatures that work on those operating systems and then do another step that is really quick that combines these detached signatures with the binaries which yields the final binaries for the Windows and Mac operating system.
Deterministic Builds — Progress
To wrap up, some issues that we are currently working on, to give an overview of the progress. I mentioned that we use Ubuntu for the build environment which isn’t ideal. I guess we can be pretty certain that there is probably no backdoor in Ubuntu. If there was, most people run Ubuntu and if they run a non-backdoored Bitcoin Core on a backdoored Ubuntu it is not really going to help. It would be great to have a way to pin the compiler version to some defined version. If Ubuntu was to change the compiler version of the environment we are currently using we can no longer easily reproduce all the builds that happen in the past because different compiler versions means different executables. This indeed happened I think for Bitcoin Core 0.12 for something. Then another thing is where we do these detached signatures. Right now it is done for each operating system by a single person so there is some kind of bottleneck. It would be nicer if this was distributed so the trust would be spread out. It could be achieved with distributed RSA and have multiple maintainers sign off on the binaries. What works already today is of course the intermediate Gitian build results are signed by a lot of people who build them. All the signatures are put in a repository and everyone can and should compare them. That concludes my talk.