Jeremy Rubin - Bitcoin Core (2018-04-23)
Transcript By: Michael Folkson
Tags: Bitcoin core
A hardCORE workout
Thank you very much for the warm welcome. So welcome to the hard core workout. It is not going to be that difficult but you can stretch if you need to. It is going to be a lot of material so I’m going to go a little bit fast. If you have any questions feel free to stop me in the middle. There’s a lot of material to get through so I might say “Hold it” or “Find me after” but if there is anything that would help your understanding fee free to classroom style raise your hand and I’ll try to answer it. Without further ado I’ll get going.
What is a Bitcoin Core Developer?
So first off, you guys are all here to become a Bitcoin Core developer conceivably or at least figure out what that means. So what is a Bitcoin Core developer? There are two things that make you a Bitcoin Core contributor. You’re either a contributor which means you are going to be writing code and suggesting that the reference client should include it. You might be writing tests and you might also reviewing and commenting on other PRs or doing general research and analysis on protocol changes. Then you could be a maintainer. A maintainer would be somebody like Wladimir who is the lead maintainer. This means you have commit access and you look at all the work everyone is doing and you decide which things get included into the reference client. People often think that being a maintainer is a big honor and everybody aspires to be a maintainer but actually most people don’t want to be a maintainer. It is a somewhat janitorial role and there is a big responsibility of having the keys for signing code that gets accepted by everyone. It is a liability for a lot of people. Somebody like Greg Maxwell said “I’m not that active anymore” so he gave up his keys. Gavin, his keys were also removed at some point because he wasn’t committing anymore. It is just like an extra role. Most people are Bitcoin Core contributors but if you’re either one you’re a core developer.
What is a Bitcoin Core Developer? What is a Professor?
So that’s a really strict definition. I just told you the most boring thing. It is just someone who writes code. More generally what’s a professor? We know that a professor is someone who teaches classes at a university and that’s the strict definition. If you do that you’re a professor. But when we say a professor we generally mean you are somebody with a distinguished research career. You do research and actually at most universities they say the best people that teach are people who don’t want to teach and just want to do research because they are going to be very efficient at their teaching or really bad at it. It is kind of similar to how a core dev is a coder. It is something that core developers do but generally there are known and it is an esteemed position because they introduce a lot of influential ideas to the community. They serve a very responsible role in discourse and dialogue. You don’t need to be a core dev to do that. There are a lot of other people that don’t actually contribute code who do contribute lots of high quality and interesting ideas. But it is just a general distinguished thing that if you can write code for the reference client you are considered to have surpassed some threshold. Lastly it doesn’t necessarily mean that because you’re a core developer there’s any endorsement or any smart thing you’re going to say. Core devs often times say pretty stupid things as I’m sure I’ll say something stupid today. It doesn’t necessarily mean that they understand it better than anyone else.
In this talk I’m going to have three major sections. Now we at least have some sort of idea of what it is we’re trying to figure out. We’re going to talk about the Bitcoin development process. We’re going to talk about Bitcoin Improvement Proposals. And then we’re going to talk about a little bit of Bitcoin performance engineering which I think is an approachable topic for getting started.
So You Want to Be a Bitcoin Developer
There are a couple of things to go over. There’s foundations. There’s your development environment. What you do when you want to start coding in Bitcoin. How you contribute. What style you should go with and how you engage with other people. And some general advice for surviving in the space.
Understand Development Philosophy
It is really critical to understand the philosophy of Bitcoin Core contributors. Just because this is the philosophy of people there right now doesn’t mean this needs to be the philosophy forever. Because when you come and contribute to the community you have your own philosophy that you bring to the table. In general most people share these perspectives which are that you want to respect all different kinds of Bitcoin users. You say for my use case, I’m an exchange or I’m a miner or I am just a user, you go “My use case is valid but yours is too.” You are writing software for everyone not just for your own use case. That said you have to scratch your own itch. If there is a feature you want you can’t just bang your fists on the table and say “Make this happen for me.” You have to write the code yourself. If there is something that you really need you need to be your own advocate for that. If there is something that somebody else needs and they’re not a developer it can be your job as well to be empathetic and hear that somebody else is asking for something and pick it up for them if they don’t have the skills themselves to do it. Bitcoin use in general, people think of it as free speech. Anything that is trying to tell somebody else how they should use the protocol is in some sense frowned upon. There are exceptions to that. Slow and steady is good. This is security critical software. Your own ego is much less important than Bitcoin progressing and being secure. Even though you may have done a lot of work if it is not up to the quality standards of Bitcoin it is not going in. That can sometimes be personally difficult to navigate for almost everyone. Lastly, not everyone has the same goals for the project. Some people have a really long term view that Bitcoin is going to be driving payments for everything. Other people really feel Bitcoin is going to be the reserve currency. Other people think smart contracts are the best thing. We all have to find a way of using the common platform together and coexisting. You have to keep in mind when you’re talking to somebody that they may have a very different end goal for the technology.
For communications, there is a lot of material that exists online. If you want to go and ask somebody a specific question because you think that they would be able to find it faster than you, ask them. But for most things you can find it by searching and it has been talked about before. Maybe on IRC, maybe on the mailing lists. If you just go and bombard people with questions which often happens you will sometimes annoy them. Some people are very patient. I know that Greg Maxwell will endlessly tell you why you’re wrong about something. It is great. A lot of people have used that as a resource to be able to learn really intricate things about the protocol. But you have to keep in mind that that’s not what they’re trying to do all day. They have other things that they want to do. Doing a lot of research is good because there is a lot of material. Organizationally it is not great but there are people like kanzure, Bryan Bishop who have put together really excellent archives of a lot of the information and transcripts from talks. You can look at the scalingbitcoin.org website and there’s transcripts and video recordings of all the talks people have been giving over the last few years. The information is out there. It is a little bit of your job to do it. Lastly because people do use IRC if you want to talk to people there that can be the best place to talk. But you have to keep in mind that it is a single threaded conversation so it is a little bit like jumping into somebody’s tweet stream and tweeting in the middle with a totally unrelated question. If people are talking about something, be respectful, wait your turn and don’t demand answers. The main channels; there’s Bitcoin Core Slack that you can check out, there are these IRC channels that get used, the GitHub issues are good to follow as well. And then there are some Linux Foundation mailing lists that get used for announcement related things. If you have a big proposal or a new version comes out. And then there’s also a StackExchange and the StackExchange is probably the best place to go and ask questions for understanding something that you don’t think is a new research question. I personally would recommend that you first join the Bitcoin Core Slack. If you join that, that has mirrors of all the IRC channels so you will be able to be a listener and see what people are talking about for a while but you won’t be able to post. Everybody else is typically around there so you can still ask them questions if you want to. IRC is a little bit hard to use for some people, myself included. I don’t like IRC that much so I don’t hang out there. Everybody does there own thing. Lastly, GitHub. It is important to understand that this is really for code, not for discussion of ideas or anything that is not directly code related. A lot of people will show up and say “Hey I have this new idea for some crypto primitive” and they’ll post it in GitHub or they’ll say “Hey, Bitcoin should really do this over the next ten years.” That’s not really what happens on GitHub, the types of things that you talk about there would be like “This module can be improved by doing x”. That’s the focus of that. The other channels are better for more hypothetical exploration.
So your development environment. What tools do you need to actually get started? Does anyone have any questions by the way on the last section of foundations? This is recorded but there are also recordings of when I presented some of these slides previously. You can also find those as well.
Fork, Clone, Build
It is pretty simple. You fork it, you make your own version if you want to have local branches. You clone it and then you start doing development. I’m guessing everyone here is familiar with how Git works. If not there is a lot of great material online explaining it. Or at least explaining how to unmess up your repository after you’ve messed it up.
Run Bitcoin Nodes
To run a Bitcoin node after you’ve run your build commands by following the build instructions. There are some dependencies that you have to make sure you get right. The instructions are all there. I recommend if you don’t have a recent Ubuntu desktop software running, just use a Docker instance or a VM of some kind or an Amazon server because you don’t want to spend too much time flexing around with a Windows build or something. If Nicolas Dorier is watching he’ll tell me Windows is the best but you don’t want to add extra complexity when you’re just getting started. So build it, the first build is going to be really slow because it does an incremental build so it is going to take like a hour depending on what your computer is and if you’ve built some of the dependencies before. You’ll get through that and then it will be a lot faster to rebuild. Now you’ve built it you can run bitcoind, you can run one with debugging information being output. You can run one on the testnet, you can run one in a regression testing mode. This is now your playground. There is a lot of information online about what you can actually do once you’re running a Bitcoin node. If you run one that is just plain ./bitcoind on your computer it might not be the best experience because you’re going to have to synchronize the entire blockchain which could take a couple of hours depending on your network connection and which computer you’re running it on. You probably want to if you’re just experimenting, run it with the regression testing mode which isn’t going to do any work and is going to let you do fun things like make as many Bitcoin as you want for yourself. Definitely it is fun to open up the Bitcoin wallet in regtest mode and see “I have 10,000 Bitcoin, I can now retire.”
As you start exploring the codebase one of the most important tools that I find a lot of developers actually don’t use is something called ctags. Ctags essentially compiles a list of locations for names in your codebase and when you’re in your favorite text editor, emacs or vim or Atom or whatever, there is a special set of keyboard shortcuts that will let you jump to where something is to find in the code. For getting into the codebase originally something like ctags is really helpful because what you should do is read a line and say “Hey do I actually know what this object is that’s being created? No” and then go use the ctags symbol to jump and read the definition of whatever that is. As you explore through this depth first search of the code you’re going to get a really good understanding of what actually is going on. If you don’t do that there’s a lot of things that you’re going to miss about how the things are actually made. C++ is easy to read sometimes but it is very difficult to write. As you look at how more of it is actually written in some of the more nuanced classes I think you’ll get a better understanding of what types of practices are used in Bitcoin.
Pick a Good Bad Idea
As you begin to code, the best way to get started is to find a really bad idea and then just do it. It doesn’t matter what it is. Pick something that is maybe too ambitious, pick something that you think is going to be fun and just try it out. One of the first projects that I did when I was programming in Bitcoin was I said “Hey there is a database and in that database none of the accesses are randomized so if some attacker were to figure out a bad access pattern for that database they would be able to make the entire network go down.” So what I did was I said “Well what I can do is encrypt every entry”, hashing I don’t think actually works for this case. But you can encrypt every entry which has the effect of randomizing the order of all the databases which means that any attacker who is trying to do a network wide attack won’t be able to trigger it on more than their own node. So I felt pretty good about this, I thought it was a cool idea. So I asked a developer, Pieter Wuille, sipa for some feedback and he was like “That’s a terrible idea because eventually we’re going to want this to be in a well defined order. We don’t need it right now so you could do it but it is going to limit what we can do in the future.” This is something that you are going to hear a lot as you explore. This is technically fine but we want to do something later on and this is going to interfere with this future goal. It also wasn’t sufficiently justified because even though I was able to say “In theory if somebody did have an attack on a database”, what I really needed to do was to show an attack on a database so it was relying on this bad access pattern across the network. Because I wasn’t able to do that there wasn’t sufficient justification for the code I was working on. The good thing was I learnt a lot from doing this. I went into the internals of how the block structures and transactions are blobbed in and out of the database and I learnt a lot about how Bitcoin manages those things. Then I said “I’m going to throw it away and move onto something else.” This is what I recommend you do for your first thing. If you want even try to do this yourself, try to do this exact same thing. It is not going to get merged but you’re going to do a small project and you’re going to learn something about how the code works.
Build C++ Expertise
The next thing that is important is to build C++ expertise. How many of you feel like you are C++ virtuosos? Like two people. Most people I think don’t really do that much C++ and getting to virtuoso C++ is something that even most people on Bitcoin won’t say they do but you need to be really familiar with how it works. There’s this blue book here by the person who wrote C++, Bjarne Stroustrup. Go and get a copy of this book on Amazon or online somewhere and read like a chapter everyday as you’re exploring. Some of it is going to feel really introductory. This is because C++ is the standard language that everything is modeled after in some senses. Anything you’re looking at today that is an object oriented system is going to really look similar. You’re going to be reading it and you’re going to go “This is easy, I know what is going on.” But there is a lot of nuance and you’ll find when you actually go to write something it is going to be much more difficult. This is true in anything. Reading a book is easier than writing a book. Reading this guide is going to give you a lot of insight for why things are written the way they are and what types of pitfalls and traps you’ll run into. Get like 25 chapters into this book and then keep it on your shelf as a reference. There’s also a site cppreference.com, that’s a good one to use. That will detail the exact functionality of any standard library that’s used in C++.
Next, how many of you are familiar with gdb and have used it extensively? Again not so many people but this is a tool that lets you take a running program and inspect all the memory of that program and the code that is actually running if you compile it with debugging symbols. This is really cool. This lets you basically do brain surgery on your program and see exactly what is happening. If you introduce a bug and something isn’t quite working as you expected it to work, gdb is going to let you add a break point into that code and say “Let me see exactly what the state is of the code” without having to add print statements. Sometimes you’ll find bugs where you add a print statement and print statements have to lock. They can sometimes fix concurrency bugs by adding a print statement. You’ll sometimes find bugs where you remove the line that is printing something and then you have the bug. Then you add the line back and the bug is gone. gdb will let you find some of these things as they are going on. There are a lot of tools around gdb for inspecting running programs. The compiler can play tricks on you. Sometimes you’ll see interesting things where the code that actually gets run is so heavily optimized by the C compiler that you’ll be stepping through the code line by line and then it will jump ten lines forward and then jump ten lines back and jump ten lines forward. You’ll say “What is going on?” The compiler is allowed to do things in any valid order that is allowed by the C++ standard. Some fields may not be initialized in the order that you’re expecting and that can sometimes be the source of bugs as well.
A quick cheatsheet of things you can do. You can step through your code. You can set breakpoints at various functions. You can say “Stop the world whenever we hit this specific line.” You can print out things about the variables in memory or the raw memory if you know the addresses already. You can also look at different threads. If you’re writing a multithreaded program gdb is going to help you see what every single thread is working on at the moment. There are only a couple of places that are multithreaded in Bitcoin but it would help you with those as well.
Review Others’ Code
Another thing that is really good to do as you start exploring how to be productive is to review other peoples’ code. There are certain people who only review code. They don’t actually really write that much code, they just read what other people have written. It is really important and it is going to help you get not only an understanding of what is good code but you’re going to learn what topics people are prioritizing right now. You’re going to see what things are currently being worked on. You’re going to go know how people communicate feedback. Sometimes it can be pretty aggressive. I don’t think Bitcoin is quite as bad as Linux where Linus will tell you “Get this fing piece of s code out of my codebase.” You can read the Linux mailing list and he will say that to people. It is definitely not that bad but people will definitely be like “NACK” which means negative acknowledgement and that means get this code out of here. If you’re a new contributor and somebody NACKs you… people are pretty good about not doing that but it still happens on occasion. Other people will step in and be like “It is a good idea but it is not quite what we need because we do this other thing.” Lastly, if you leave useful feedback for someone they are going to be pretty grateful in a certain sense. If you go in and you’re like “I don’t like the style of your code” and it is somebody who is established they’re going to be like “This is the style guide”, it doesn’t really matter. If you say “Hey I think that this function that you wrote could be a little bit faster if you did this way” it is not a critique of them. You’re helping them improve their code and everybody wants to improve Bitcoin so people will be pretty welcome to that. That’s a way to make people happy with you. And then when you go and write code people who you have made happy are much more likely to return favor and check out the things that you’ve worked on as well.
Let’s See What’s Hot
I guess if we can just take a look right now. This is taken earlier today at what is going on and what people are looking at. Can you guys read that? I’ll read aloud. These are labels for current pull requests that people are working on or issues that they’re looking at. These are sorted by number of open things. So a lot of people work on wallet code. A lot of people are focused on how do we actually hold and manage the Bitcoin that people are entrusting with Bitcoin Core. A lot of effort is put into the RPC library. What utilities do we allow people who run a Bitcoin node to access? There’s a lot of effort that goes into the graphical user interface. I don’t personally do anything on that but people do put a lot of energy in. People do P2P work which is on the peer-to-peer protocol, that is how nodes connect to one another. There’s features, there’s refactoring, brainstorming, tests, build system, validation, docs, bugs, block storage, UTXO databases and then resource usage. I’m going to step through most of these with a little bit more detail just to give you some idea of what kinds of projects people do in each of these spaces.
For the wallet, this is the end user functionality for keeping your balances, detecting that that money has been sent to you and making new transactions. Current things that people are working on are separating out the wallet and node processes. Russell Yanofsky (ryanofsky) is a developer who is working a lot on this. Right now when you’re running a Bitcoin wallet it is running in the same logical process as your node which means that if your node got hacked by one of its peers, it would be able to see all the memory of your wallet which could have your private key in it and that would be patently bad. Luckily there are measures that help prevent this. In general it would be really nice if you could say none of the wallet code is sharing anything with the node code. They are completely separate applications and that’s not the case right now but people are working on it. SPV verification is another thing people are working on. This would let you take a Bitcoin Core node and say “Here’s another node that I trust. Let this one prove certain facts to me and I’m going to believe that they’re going to tell me mostly the truth but I’m going to be able to check small statements about that.” That would allow you to run the equivalent of a full node on your cellphone and just connect to a semi-trusted Core node that maybe your friend is running. People are working a lot on that. There’s improvements for some of the APIs available in the wallet around accounting. In the early days of Bitcoin there was a funny tool called Accounts. You could say “This is my Bitcoin account for my drug money. This is my online poker money.” Because they were actually accounts and not UTXOs the way we think of them in Bitcoin. You could spend money from your poker account and go into a negative balance in your drug money account. The whole API didn’t make sense to anyone so there has been a big effort to rename it to Labels. You could say I got this money because of this but it is not separate from any other fund and make that API improved. There’s MultiWallet which is going to allow you to use multiple separate wallets. This is more like accounts as you would think of them within a single application. Right now if you want to use a different wallet you have to restart your node completely. This would let you load them simultaneously. A developer Luke Dashjr is working a lot on that. Lastly, privacy. People think a lot about how you can make the transactions that you make without revealing more information about yourself than you intended to do. There’s a lot of interesting stuff happening right now around how you pay fees, how you make change and how you sort and order transactions. These are all things that any of you right now could just go and jump in on and start working on. Obviously with a day or two of reading to see what everyone is actually doing on these.
So the RPC. This is also fun. I’m doing some work in this space myself. This is how you manage your Bitcoin node. This is how you tell it “Hey. Connect to this other node I know about. Don’t connect to that one. Let me know what the most recent block you’ve seen is. Does this transaction exist?” That’s all the things you can do in RPC. It’s your driver’s seat. It is important if you are building applications on top of Bitcoin. So if you’re building an exchange you want lots and lots of features in the RPC that may or may not exist right now. You may extend them privately. If you’re going to get hired let’s say at Coinbase they might ask you “Hey, add these features for us” but they might not get added to Core. Or you could add them generally available for everyone. Right now I’m working on a new feature called RPC whitelists which lets you provide an authorization list for what things a specific user is allowed to do when they access your node. Other people are working on features that let you control the performance of your node so you can tell it dynamically how many threads to run. RPC is remote procedure call. Sorry if that was jargon for you guys. So generally it is like a JSON RPC so you can send it a JSON saying “Make a transaction to this person” and it will make the transaction. So it is a low level way of interacting with your node. Also people are working on better usability. Sometimes it will be a query that gets frequently used by an application and it requires them to make one initial query to get some list of transactions and then for each one of those transactions they now need to make another RPC. That’s what’s called, if you guys are Rails developers, the n+1, they talk about that a lot because in Rails it is really hard to not write n+1 queries because you have to do the first one and then you have to do n for everything it returns. People are working on things like that that are going to make it more usable for application developers, also something that can be good to do if you look at some applications that people have built on top and you go “They’re doing one of these complicated queries where they’re calling RPCs in a loop, can we make it a single RPC?”. It is going to improve the performance of that person’s node.
I don’t do any work on the graphical front, I’ve done like one or two things so I can’t tell you that much. People again are working on this process isolation thing. You want to separate out the interface from the wallet and from the node. People want to expose more RPC functionality. The RPC is like power users’ tools and things get added to the graphical user interface maybe a year or two later. People are always working on adding some of these RPC tools. People also just work on general usability and performance. They want things to be fast and not laggy.
In the peer-to-peer layer this is a really fruitful area for research and development. This is the general problem space of how do we connect all these nodes together and broadcast blocks and transactions without the network going down completely. So people look a lot at things like denial of service. They say “What sequence of messages can I send to the node and cause it too fail?’ They also work on migrating to more reviewed standards so we’re moving right now from this custom network event library to something that is really standard called libevent. People are working on something called SPV block filters. You can think of the peer-to-peer stuff as the RPCs between nodes. There are the RPCs for your own node that you’re trying to use but these are things that anyone on the network or in the world can ask you to do for them. These SPV block filters would be really useful for building these lightweight wallets that I talked about earlier. There’s also a lot of science going in, I’m talking about computer science papers people end up writing, about how you detect bad peers and how you find good peers. Recently there was a lot of news around this because Ethereum had a really big problem where you could end up connected to a lot of bad peers and they were able to fix it. Bitcoin also has a lot of research spent on this. People also think a lot about privacy leaks. How do you prevent people from knowing specific things about which node you are? There is a new protocol that people are working on called Dandelion where if you send a transaction to someone, if they haven’t seen it already there is a reasonable chance that you were the person who created that transaction. Or at least if you think about the graph, you are closer to the person that created it. There are companies out there, Chainalysis for example that make a lot of money by having lots of nodes on the network and figuring out who created a transaction. So people think a lot about how we can improve the privacy of spending Bitcoin and one of the main things is by not leaking data about which node made the transaction. Dandelion is something that you can Google if you are interested in those kind of problems. It works by having a dandelion pattern. You send it to one person, they send it to another, they send it to another and then after a couple of hops randomly it will explode and send to every node that they know about. People also look at reducing bandwidth, it is really critical. Right now with Bitcoin we assume that there is an internet connection. We don’t have to assume that. There are ways of making the network work without an internet connection, with a satellite connection and with other things. People think a lot about “Hey if bandwidth becomes really, really expensive or here are network problems how do we make the bandwidth as small as possible?” So one of the major topics is something called compact blocks which I will tell you more about later if time allows which is one of the biggest savings in bandwidth that has ever happened in Bitcoin. It makes blocks propagate a lot faster throughout the network because it reduces the bandwidth of every new block found.
Refactoring is something that is also pretty critical. I would say this isn’t the best thing to do just coming into it for major projects but there are definitely little things that are easy to do if you are just getting into it. To refactor you really have to have an understanding of where the code is, where it has been and where it should go. This is something that you get as you’ve seen the codebase evolve for a while, what needs to be cleaned up. Bitcoin’s codebase is not fantastic, there is a lot of weird stuff in it. You wouldn’t hire somebody to write a codebase like that today. It turns out it is a great codebase. It is really secure, a lot of things are really fast and well engineered but it is not modular and compile times aren’t fantastic, it uses a lot of global variables etc. So people work on these things. They try to split modules into smaller logical units. They work on making standard libraries for some basic functionality that should always be the same no matter what version of Bitcoin you’re using. What is proof of work, what that definition is should stay the same across versions. Splitting big files, so if there is a file that contains 2000 lines and it does two different things, you try to split that into two different files. One of the major ones that happened recently and is still in the works is splitting the main.cpp file into two files: one called net_processing.cpp and one called validation.cpp. net_processing handling all the network operations and validation handling all the functional correctness properties. Getting rid of libraries that are not part of the C++ standard is also a priority. Some people want to get rid of something called boost, if you are a C++ dev you both love and hate boost. You want to also get rid of globals because global variables make it harder to analyze your code, harder to test. Making everything as explicit as possible, I call this constifying, you could also call this making C++ look more like Rust. If you don’t need something, if you don’t need to be able to modify something it is really good if you can specify in the code that the thing should not be able to be modified by the person who has it. That would make it easier for you to check that that thing doesn’t accidentally get modified or somebody introduces a bug later. DRYing up code, there are a lot of things that are duplicitous in the codebase so making things a little bit tighter semantically and reusing things that are used multiple times is good. And then also in refactoring are minor performance fixes. If you say “This thing is making an extra copy and I fixed it” that’s a small refactoring that would get this label.
Tests are also really critical. They also can be a little bit hard if you are just getting into it to do because if you have to understand the codebase. But they are also good because you have to understand the codebase to write them. There are two kind of tests in Bitcoin, there are unit tests and behavioral tests. The unit tests are testing some small piece of the code like a single function. The behavioral tests are saying “If we have a sea of nodes connected to one another do they do the correct thing over time?” So contributing on any of these is great. They are slow so making them faster is something that will make everyone love you because every single developer is spending hours a day just waiting for their tests to come in. Adding more simulation tools is really good. It is actually a problem that there are no good utilities for simulating over historical data in Bitcoin. To say “I think this is an improvement but how would it handle all of the data that has happened in the past?” Increasing the reliability of the functional tests, sometimes they mysteriously fail so it is a perpetual problem of making sure that the tests are themselves correct. It is an exercise in itself that you do through seeing when they fail and understanding if it is the code or the tests that’s the problem. Then there are other more exotic kinds of testing that are starting to make their way into Bitcoin. Fuzzing and more static analysis tools and also something called property testing are becoming major topics. I’m not going to give you a laundry list of all these things but you’re going to be able to go back to these slides and have the right things to Google for as you’re trying to figure what topics are exciting and fun for you.
Validation I would say as you’re getting in, newbs also come in and want to do something in consensus because that’s like the heart of it. You’re not going to get anything done in consensus, nobody gets anything done in consensus. You might as well ignore it. It would take years to make a major change to how consensus works so it is not a great place to start and the semantics are really complicated. That said, take a look and see what is going on. There are opportunities for that and looking at the performance of it, looking if there are denial of service things, ways you can make a node chew on a block for a really long time and not be able to process new information. And then there are some refactorings that are still in progress that you are able to help with as well.
Lastly, the last topic that I think is really good is resource usage. This is just figuring out “Is my node doing something it doesn’t need to do?” People are always finding things like a megabyte of memory here or there that is getting allocated that doesn’t need to be allocated. That might seem small but people try to run Bitcoin Core on Raspberry Pis, on old ones. If you want people to be able to do that you need to heavily make sure that Bitcoin isn’t doing anything it doesn’t have to be doing. Tightly looking at the resource usage is something that people do. Using custom data structures, that’s something that I’ve done and I will tell you more about a custom data structure that I wrote that is a lot more optimized for Bitcoin. Even just using the correct data structure. Sometimes people use standard map which is a tree when they should be using an unordered map which is a flat array. Finding those little things can actually make a pretty big difference in critical parts of the code. Reducing memory allocations in general, looking at the operating system interactions and making sure that we’re being a good citizen on the platform. There are a lot of times when if you are running a Bitcoin Core node and you’re also browsing the web, all of a sudden your whole computer will lock up because it is doing some really expensive operation. So people are looking at “Can we make it a little bit less aggressive and not cause disruption for somebody who is just trying to use it on their normal computer?” Caching and arena allocators, there’s lots of fancy performance things and definitely if you’re into the performance engineering side resource usage is the thing that you want to look at. Does anyone have any questions about any of those topics? I know it was a lot but I think we’re going to be a little bit more abstract now.
Q - What is fuzzing?
A - Fuzzing is where you send a random noise to a node and you see it it does something. When you receive a random message, unless it is actually a valid message that you were expecting of some other kind, you shouldn’t do anything. Then there is more specific fuzzing which is like given a well formed message with random data in all of the fields does it do anything? And then there’s a more specific one which is given a well formed message with the fields correct but the messages in random orders, does it do anything? It is basically hitting a node with lots of garbage data that is of varying qualities of junk. Sometimes you can get surprising things to happen that you didn’t know were a part of the current functionality. That’s one way you get a little bit of confidence that somebody isn’t going to accidentally take out the network.
Here’s One of Mine
So now I’m going to tell you about something I’ve been working on recently just so you can get an idea of what this actually looks like for something people work on. I’ve been working on this thing called RPC whitelists. I’ve been working on it, not full time but a side project, for a couple of months.
Let’s See What Happened
When I posted it, after I filed an issue and I described what I wanted and I gave a proposal for what it would be and then I add an implementation. Right away everyone was like “this is great.” You’ll see people say utACK, that means untested ACK. That means “I like your code, it looks good, I read it but I never tried it myself.” Then you see people say Concept ACK that means “I love this idea, I have no clue if you did a good job with it, I haven’t even read the code but good for you for doing this.” That is pretty weak but it means that if you did well they want this thing merged. That’s actually more important than a utACK in some senses because the utACK just says the code is correct. The Concept ACK sometimes means more because it carries the weight of the person saying “I like this idea in general.” Then people will tag other people who they think would be relevant for that. They’ll say “You should take a look” and other stuff. Then I get somebody who goes “I don’t like this at all.” He says “This implementation is problematic.” I didn’t think it was problematic but he thought it was really problematic. He has good points and this is one of the things of low ego that you have to have while you’re doing this. Somebody can be like “I don’t like your s***, get out of here.” You have to go “Ok why didn’t you like it. Can I make it something that you’d find appealing?” So after a bit of back and forth talking about the general space of the thing, I updated the pull request and I added a couple of the things that were going to make it not problematic. His main complaint was that he didn’t like that when I enable the feature in order to maintain backwards compatibility. That’s something you’re going to hear a lot of people say when you add something new, “You’ve just added this new feature but it needs to work the same for anybody not using this new feature.” You need to maintain backwards compatibility always. He said “The way that you do this actually prevents this from having a default safe behavior and you’re adding a security feature so you should find a way to make it default to the safest possible behavior.” With a small tweak I was able to make it default and I think it is a good middle ground but this is still sitting. It is not merged yet. This is something that if you guys wanted to tonight you can go take a look and review and tell me if it is problematic or not for you. That’s generally what it looks like when you try to do a pull request. You wait around and this will probably get merged maybe in two or three months. It is a lot of sitting and waiting. You don’t actually get things in that quickly and they shouldn’t go in that quickly. This is software that people are going to be relying on with billions of dollars. I am very happy even though it is personally depressing, I’m very happy in the abstract that it should take that long. The worst would be if my code were responsible for somebody losing lots of money. That would be awful. Luckily Bitcoin says somewhere in the thing no warranty or whatever. You’re not going to be responsible if that does happen although people may still think you’re responsible. You may lose the court of public opinion but not be in jail.
For testing, as I said earlier there are two kinds of tests. There are these unit tests and there are these functional tests. How many of you use Travis on your GitHub? So you can enable Travis on your fork and then anytime you push a fork it is going to run a bunch of tests on more platforms than you have locally probably. So that is kind of the easiest way but nothing beats the quickness of building it locally and seeing if that works, doing a test if you’ve got to do a really tight loop. I would recommend doing local as much as you can but before you submit it to the rest of the world you want to actually test it locally. Writing new tests and reading old ones is good just to understand. If you are starting to look at a piece of code and you’re going “I don’t understand what is going on here” go read the tests because the tests are going to show you what that code is supposed to be doing. They are going to have simple cases of given this it should do that.
How are we doing on time? I don’t have a clock. Twenty minutes. We’re probably not going to get through everything but I’m going to talk a little bit about contributing.
Important Contributions to Bitcoin
There are a lot of different ways to contribute and really be an important member of the Bitcoin community. You can do documentation, you can have novel ideas and put them out into the ether, not that ether. You can review other peoples’ code and ideas. You can do rigorous testing. Conference and meetup organizing is really important for those in the back. You can make tools for developers but you’re all here because you want to conceivably write some new code.
Write Good Code
The first thing you have to do is write good code. Now that is kind of obvious that you have to do that. It is important to find an issue that you personally think is important because it is going to take a lot of effort to get something in so you don’t want to be doing something that you don’t care about. Find something that you like. Then write something that you think solves it. Try your best to document it as clearly as possible and write some tests that cover the code. Then put a branch of that into your fork.
Seek Early Feedback
Once you have done that, you’ve got something that somebody else can conceivably look at and use and take a peek. Write a message to someone, someone like me. If you know Matt Corallo, thebluematt, Cory Fields, theuni, jonasschnelli. You can treat people like that as your triage nurse. You can go and be like “I’ve got this thing”. Either they can help you directly or they are going to tell you “Don’t ask me, ask this person. Let me connect you with them.” You want to do that because you don’t want to go to the wrong person and you also don’t want to broadcast it out saying “I have the best thing ever” because a lot of people have a lot of context for other things going on. You can even talk to them before you’ve even started working on it and they’ll you if it is advisable or non-advisable. If you’re really not getting feedback you can go onto the #bitcoin-core-dev IRC and say “Review this.” There’s a weekly meeting on Thursdays you can go and say “I’m looking for review, I need help with this” and people should be receptive to that. As I’ve said and will say a lot of times you’ve got to be gracious. Negativity on your work is not negativity on you. Everyone is happy to have more people, more eyes on the codebase is why it is secure.
Run New Nodes
I also think that it is good if you run a lot of nodes yourself because then you’re going to be able to get more experience of seeing when things go wrong or when things are not syncing. Hopefully they are always syncing but it is good if you have something that you can always keep on so maybe run a different server for this if you can have something always on in your house. When you’re doing a test of a new feature compare the debug logs. “I had this new node that I made and this old node. Is the new one actually synchronizing as fast as the old one?” If you introduce some regression you’ll get a sense of “For some reason I’m no longer connecting to good peers. What happened?” You can go figure out why your new behavior is not as good. Or if you are beating it now you’ll have evidence to say “Hey I ran this for a month and I synchronized 10% faster all the time.” That would also be good for you having personal assurance that what you’re working on is valuable.
Restructuring Code Changes
This is difficult and something that if you haven’t worked on big, open source projects before or had strict practices at your company you’ll get used to. You want to restructure your code changes to be a small set of logical steps that make sense. Sometimes I’ve done a lot of work for like a month and then I’ll take the final commit and I’ll rewrite intermediate commits between it that had nothing to do with my development process but I felt were instructive for being small steps. Small step semantics are easy for other reviewers to go through because what you say is “I have this version and I think it is bad for this reason. I will make one small change and it is still ok but this is a step towards another thing.” Then you’ll make another thing, a small refactoring and you’ve create these, you’ve synthesized them from nowhere. They’re not actually something that you had to do in your development process. Then eventually you get to your end state. What that allows a reviewer to do is they’re able to quickly go over the beginning ones and say “This seems correct, this seems correct, this seems correct” and then only the complexity at the end is what they actually then spend their cycles on. If you do it altogether they’re going to be thinking about all the refactorings that you had to do along the way to get there. It is going to make it so that your PR is unreviewable and will almost always never get merged. You have to get comfortable with the idea of having to rewrite code once you have something that you like.
Open a Pull-Request on GitHub
Once you’ve done that, you’ve made your fork, you’ve asked for feedback, you’ve restructured your stuff so that it is logically coherent for other people to follow along, then it is time to open a pull request on GitHub. Make sure your Travis has passed because people won’t look if it has a red x over it. Then write up a few paragraphs motivating your change, explaining what you did, what types of review you’re looking for, what the future work could be that you’re not going to do and any other detail that you can think of. If you want an example go look through my PRs because I try to always do this. Not everyone does it but it is going to set you up for success if you give a really clear motivation for what you’re working on and why other people should review it and why it is important. At that point you can even ask someone like me “Hey can you read this PR, comment that I’m going to do” and I’ll tell you if there’s something that I would want to know before you open it if you want to make sure it’s going to be received well.
And then you wait. Review takes time, sometimes months. When you get feedback try to respond quickly. Sometimes you’ll be busy on something else. For a reasonably complicating thing it could take months or even a year for it to be merged. That’s ok. The release cycle for Bitcoin is every six months so no one is going to be using the code in the real world for at least six months. There’s no struggle to get it in tomorrow unless you have work that is depending on that getting in faster.
Your First Contribution
Now you see this whole picture, for your first thing I’m going to tell you do something that you really don’t care about. Do something like adding a little bit of documentation, adding tests, fixing a typo or something if you can find one. I write a lot of typos. I write them so other people can find them. Just go do that, it is something that you won’t care about. It will get you familiar with the actual process of getting something merged. Then you’ll get that little badge on GitHub that says you’re a contributor to that project which is nice. If it doesn’t get merged for whatever reason you don’t care, you’ve spent like ten minutes figuring it out. Do that and then while you’re in the middle of that get started on your bigger project. You don’t want to be using something that you’re actually proud of as your first experience for getting something merged because then you’ll be upset when it doesn’t get merged for silly reasons for a long time.
Good to Read
I’m not sure I have too much time left but we’ll do a little bit of general advice on existing in the community. There are a bunch of things that are great to read. Here are all these resources where you can go and spend your afternoons and evenings and days if you’re unemployed like me, just going through and learning lots and lots of stuff. Definitely spend time here. Maybe controversially now but Twitter is also pretty good generally. I think that’s where the good public discourse happens.
Bad to Read
I would avoid reading Reddit or the New York Times or The Economist because they don’t really know what they’re doing and it is a bit behind the times. There’s enough stuff in the Good to Read category that you’ll stay plenty busy.
And socialize. There are a lot of really cool people in the space. That’s why it is fun to be a part of. I think a big reason why most people who are core developers do it is they like interacting with these other really awesome developers. Get to know them. Go to Scaling Bitcoin if you can or Breaking Bitcoin or these meetups. Hang out on Twitter, I meet a lot of really cool people that I wouldn’t meet otherwise just by tweeting at them. Try to make friends because if you don’t have friends you’ll get kind of sad on your own.
I repeat myself a lot because it is important, you have to be patient. It is security software. Given a choice between you being a developer and introducing a bug, everybody is going to take not introducing a bug. If you can’t wait for something to get sufficient review this may not be rewarding enough for you if it is going to be painful. It is really slow. People just constantly nitpick everything about your code. You’ll get upset when somebody finds something that is broken that you spent the last 30 hours proving to yourself that it was correct. It turns out there’s some case you didn’t consider. But at the end of the day once you have things that are getting into Bitcoin it is really high impact. Billions and billions of dollars are relying on that code to be correct and you can make the system significantly better by writing a couple of clever things. That is a reward in itself but it is long term gratification. It is not something that you’re going to feel tomorrow “I’m a Bitcoin Core developer and I’ve done great things.” It is going to take a long time for you to reach that point.
There’s a whole other talk in here that we could get into but time wise we’re limited. I’m happy to take a couple of questions and the rest of this will be online.
Q - In your experience do ideas ever flow from other implementations to Core or is it mostly a one way street?
A - They do. I’m trying to think of a good example. I think if you look at, I’m trying to remember who had it. I’m probably going to get something wrong, somebody will have better references than me. I think something like Segregated Witness was originally done inside of Factom. Does anyone have the proper reference on this? Just believe me. I know somebody else had something else really similar to Segregated Witness. That was known as a problem in Bitcoin for a while. It was fixed for their protocol and then Segregated Witness heavily borrows from the same idea. That is the only way to fix the problem. Bitcoin Cash has a different way of fixing it. That’s an example. The set of people who are professors in the space are people who are core developers who end up seeing from the bottom everything up. There’s not a level lower unless you are a cryptographer. It turns out there are cryptographers who are core developers who understand how all the signatures and hashes are working and have improvements for them. The net flux is definitely outwards.
Q - The reason I’m asking is I could be higher impact faster on other implementations because I don’t know C++ very well. Is that worth it or should I go straight to learning C++?
A - If I were in your position I would make the investment in learning to be a Bitcoin Core dev because I think you’ll probably learn more overall from that community. That is my opinion. There is a lot to learn from Ethereum, there is a lot to learn from Stellar, it is just different things that you will learn. Generally I’ve found that the people in the Bitcoin space know the most about other spaces compared to people from other spaces.
Q - I meant the Golang Bitcoin for example.
Q - Earlier you mentioned when you were talking about reducing bandwidth you said something compact….?
A - That’s compact blocks. That’s something made by Matt Corallo. In the slides that is going to be available there is a whole thing on what compact blocks is. You’ll be able to read that, it is a good introduction. Basically all you’re saying is that a block is composed of transactions and every node maintains a list of pending transactions so instead of sending the block that has a list of transactions, send the block and a list of the IDs of which transactions should be in it. If you do that instead you end up sending like 10% of the amount of data and you reconstruct it when you receive that sketch of the block from the transactions you already have. That was made by a lot of work and ends up significantly reducing the bandwidth because you’ve already sent the transactions that are in the block, that’s all that that is.
Q - Do new contributors tend to stick around or do you have a lot of people who contribute and then leave?
A - Recently we’ve had a couple of people coming in who are looking at doing more PM type work to I presume keep better metrics on how people are getting involved in that way and if they stick around or not, what the bounce rate is. I don’t know of anybody who has that. What I would say is right now there are more people coming into Core per day than I’ve ever seen before. There are currently about forty people who would be considered core developers more so than just a contributor who has contributed once. People who are actively involved, maybe about forty people and they’ve been around for a while for the most part. New people come in pretty often and I think that once you make it into the set that are considered active you end up staying for a while. I don’t know exactly how many, I just know that recently I’ve seen a very big uptick in the number of people who are trying and making a pretty good effort.
Q - Do you think there is enough work for people who are doing things like documentation, I wouldn’t say janitorial but clean up work in the overall codebase to keep somebody invested in the project for a long time?
A - You mean if somebody just wants to do documentation type stuff? There definitely is. I think also if that is your preference for working on that kind of stuff there is a lot of things that need to be done in the usability direction. As far as I know and I could be wrong, nobody has ever done a usability study on Bitcoin Core, never. That isn’t great. Dan in the back might know. Have you done a usability study on the Bitcoin Core wallet? No one has every done it. I have a question. How would somebody who doesn’t have hands let’s say, how would they use Bitcoin Core? I want them to be able to use it, it is not my expertise to be able to figure that out. No one has ever tried. I think that making it a lot more accessible that fits in with documentation. I think you can find endless things to do with documentation of “How you do make Bitcoin compatible with screen readers?” I haven’t seen anybody do that yet but there are people who spend time on that and you will find a lot of work cut out for you if you want to do documentation.
Q - Another question is more of a fun question. What was the most surprising thing you found out about the codebase or the process compared to your expectations before you came in? You had this image of this project then you come in and you discover something that shocks you?
A - It has changed a lot over the years and I think my initial shocks would be less true now. A part of it is how bad the codebase is. You look at it and you go “The way that this is done is just weird. What was Satoshi thinking when Satoshi wrote that?” I don’t know, he was just trying to get it done so he got it done and he launched it. He obviously succeeded so who am I to judge? That is what is fun about it because you go “This thing is obviously wrong. Whoever did this needs to be taken out back and shot” and you fix it. There’s obviously work cut out. That’s where sometimes I’ll go, especially for performance related things, you’ll look at a line of code and go “If you wanted to write a slower way of doing this you couldn’t.” Then you’ll go “Now I’m going to write the fast way.” To be totally fair if I were writing the code for the first time ever that anyone had ever written that code I probably would’ve written the slow one too. Hindsight 2020.
Q - As a follow up on that. Let’s take your example where you try the whitelist RPC calls. It seems that it would be useful for Litecoin and Dash so would you submit that to the other projects?
A - I probably would not submit to them. If they are regularly downstreaming patches they can get it themselves. If they want to pay me I’m unemployed they can give me some money and I’ll get it in there for them. In general, I’m just going to put it into Bitcoin. It would be useful for them too. Everyone should maybe have it. But there are a lot of things that are good in Bitcoin that maybe everyone should have but they haven’t yet taken. It is really difficult to rebase a whole codebase. Zcash, I think this is the case, runs on Bitcoin 0.12 versus the latest one which is 0.16 or something like that. They could be running with lots of new features but it is really hard to take everything they’ve done on top of Bitcoin Core 0.12 and move it over. They may not ever do that.
Q - You touched on something a couple of times in terms of currently being unemployed. Blockstream aside, what would you say is the priority incentive for people to contribute to the codebase is?
A - I would say the set of incentives is similar to being a professor. What’s the incentive for being a professor? I don’t know, you want to do it. Economically there are much better pathways for someone with that ability. Timewise, lifestyle wise there are also better pathways but you want to do it because you want to teach people or you want to have an impact on this thing. You care about Bitcoin, you care about the mission. That is ultimately the incentive. There definitely could be much more money flowing into Core development than there is right now and I know that there are a lot of people who are working on that. It is one of those things. The number of contributors who could be working full time on this is growing much faster than anyone is allocating capital into people working on Core development. I think part of it is people underestimate how critical it is for this stuff actually making it. I would say right now it is someone is spending a lot of time thinking about what furniture is going to go on the space capsule versus thinking about “Are we sure our rockets aren’t going to blow up on the launchpad?” We need to do a lot more to make sure these networks are really hardy and really distributed and actually going to work. That’s a lot of effort but right now people are focused a little bit more on what the furniture is going to look like problems. What color is the rocket ship?
Q - Isn’t that question of motivation one of the things that differentiates people who contribute to Bitcoin Core versus a lot of other projects? Bitcoin didn’t get ICOed. It feels like it is a very different incentive structure because people are more driven by internal motivations, intrinsic not extrinsic?
A - I won’t name and shame with this one. One of the most important Bitcoin Core developers told me that he once had a million dollars on Mt Gox but it was cash not Bitcoin. It was lost and now he has no Bitcoin but he still works on it. For most people, everyone theorizes that the network should be incentivized because everybody has some Bitcoin and they think it is going to make them wealthy if they make it more performant in some way. That is definitely true for some people but not for everyone. There are a lot of people who do it because they find it gratifying for other reasons. There’s altruism or selection bias for people who are stupid in the way they decide to spend their time.
Q - Given the disparity, outside of Blockstream, between money invested in to support the lifestyle of open source devs versus the furniture on the spaceship projects, what do you feel breaks that trend? Obviously it is not going to last. Is there any event or thing that you see will come up and break the current trend of money flowing into nonsense?
A - Probably not. One of the things to stick in the metaphor, I love bad analogies, when you decide that you need these expensive, Italian, leather sofas in your spaceship that weigh a thousand pounds you realize you need to make it cheaper to send heavy things. I’ve definitely heard people theorize that in order to attribute enough capital to these commons people are only willing to allocate a tenth of a percent, you need to have like a trillion dollar industry to put enough capital into the bottom layers. That is an upwards pressure on the stupid s*** that people put money into. They make some small accidental investment in the base layers. I think that that will be a part of it. I don’t like that version of it, I think we can do better if we have some more personal responsibility for these things. We’ll see how it evolves. Things are going in a better direction right now than previously.
Q - For that analogy that you were just talking about, can you give a more concrete example? What is the sofa in Bitcoin Core?
A - Sure. For example, there has been a lot of money spent on colored coins if you are familiar with those, tagging an asset. That’s like cool stuff and this was a while ago so it is maybe not the most recent example. There have been a couple of companies that have raised reasonable amounts. Probably more than has been directly invested in Bitcoin Core development for those companies. They’ve all failed because the stuff doesn’t scale well enough. I think that that is maybe a reasonable example. It is not as bad in Bitcoin. If I can put my maximalist hat on, Ethereum, we put all this money into Ethereum. It is all these amazing applications that are never going to work in that decentralized environment unless they figure out all these other hard problems. Until that happens we don’t even have the rocket ship yet. Internally it is not as bad, externally if you look at the Bitcoin dominance index, everything else is to a certain extent frothy, unproven, not necessary. I think the things that are important are more research on consensus. My disclaimer, I’m an advisor to Stellar. I advise them because I think having more diversity in consensus algorithms is actually good. That is the base layer, that is the rocket ship that we are all riding on. I think other things like that, the basic crypto, what is a transaction? How are we doing signatures? What’s our privacy like? These are all basic things that need investment more so than Cryptokitties or something like that.
Q - On the privacy side what kind of ideas are being worked on?
A - There’s actually a lot of really cool stuff in privacy right now. One of the recent, really fundamental breakthroughs is something called bulletproofs. Bulletproofs are a way of doing what’s called a range proof. This has an application in something called confidential transactions. This is where you mask all the amounts that are being sent in a transaction so it just looks like noise. You’ll be able to see who sent, you’ll be able to see who got but you wouldn’t be able to see how much they sent or how much they got. This is not as strong as something like Zcash but it is much higher performance in terms of needed bandwidth. That’s one of the really major things. Bulletproofs make it usable. One of the other major things happening right now is something called taproot and graftroot which enable fancy smart contracts in Bitcoin without loss of privacy over what those fancy smart contracts were. I’m really not sure that something like bulletproofs will actually make its way into Bitcoin because it is a complicated change. For the most part you can always just tell someone here’s how much money I was transacting. Nothing is stopping you from publishing it anyway even if it were mandatory to use it. Even in Zcash I can prove to you what my transaction was so all these things are always kind of optional. Unless you have strong deniability in your crypto. Then no one might believe you. You might be able to prove that you can’t have deniable transactions, I don’t have a proof for that. If you did you could falsify things. I don’t know, to be determined.