Discussing 0.21.0 Bitcoin Core Vulnerability Disclosures
Speakers: Gloria Zhao, Niklas Gögge
Date: July 31, 2024
Transcript By: kouloumos via tstbtc v1.0.0 --needs-review
Tags: Bitcoin core
Introduction
Speaker 0: 00:00:00
Hello, Niklas.
Speaker 1: 00:00:02
Hi, Gloria. We’re here to talk about the next batch of disclosures for Bitcoin Core. And this time there’s only two bugs and they were fixed in version 22. So that means 21, version 21 was still vulnerable to these two bugs. If you’re running version 21, you should upgrade. I mean, you can listen to this podcast and decide for yourself if you want to upgrade, but my recommendation, our recommendation would be that you upgrade. Next month, we’ll be doing bugs for version 22. So ideally, you upgrade to the latest maintained versions, which the lowest one currently is 25. So you should upgrade to 25, 26, or 27.
Speaker 0: 00:00:50
Right. So yeah, spoiler, but don’t just upgrade to 22.
Speaker 1: 00:00:58
Yeah, because you’re just doing the same thing every month.
Speaker 0: 00:01:02
OK, great. So why don’t we dive into the disclosures.
Background on Bitcoin peer-to-peer address relay
Speaker 0: 00:01:07
The first one we wanted to talk about was the Adder Relay one, right? So let’s give a bit of background for address relay. So the problem here is when you start up your Bitcoin node, you need to connect to the network and the network is consisting of anonymous, pseudonymous nodes. And there’s a few things that make this kind of difficult. One is that the active nodes on the network and their IP addresses or onion addresses, it changes on like a per second basis, right? So if we tried to release software where we went and scraped all of the addresses of the active nodes on the network and tried to publish that along with the Bitcoin Core software, that would be outdated really, really quickly. And the second thing that makes this really difficult is, of course, like, because these peers are anonymous, you might imagine that there are malicious peers and people that are going to tell you about addresses, they may give you the addresses of just nodes that they control. And that can be really, really dangerous because if you join the network and let’s say you connect to 20 other nodes and all of them are controlled by this one person who is say trying to attack you, you’ve been eclipsed. And something that they can do is they can either withhold the current main chain from you so you’re not aware of new transactions and new blocks that come out, or they can even feed you an alternative chain where, For example, you got paid when you didn’t actually get paid in the real Bitcoin blockchain. And so it’s very important that your contact book, your address book, has hopefully honest, good peers in there. And so, on the node, sorry, on the network, nodes will gossip adders to each other, addresses. So this is different from your Bitcoin address, like BC1, blah blah blah. This is the addresses, network addresses, like IP addresses or onion addresses. So just make sure that you don’t confuse those. We’re going to call them adders. We have adder gossip. And like I’m alluding to, it’s an example of, you know, data that’s really crucial, really important to the functionality of your node. But there’s no meaningful way for us to, say, attach a proof of usefulness or proof of correctness to these pieces of data. So unlike blocks that all have proof of work on them, someone can send you, like you’ll say, thousands of adders that are all incorrect. They don’t actually correspond to real nodes or they correspond to malicious nodes, et cetera. But we still need to make sure that we optimistically kind of download this information and handle it in some way that allows us to still connect to the honest part of the network and hopefully damage control if there is someone spamming incorrect or malicious adders. So do you wanna continue with Adderman?
Speaker 1: 00:04:28
Sure.
Bitcoin Core’s AddrMan (address manager) data structure
Speaker 1: 00:04:29
So One part of the solution that Bitcoin Core has is a data structure called the AdderMan, which basically is a database for addresses you’ve seen, and it also sort of has some heuristics for deciding which addresses are good, which addresses you’ve tried to connect to. So it doesn’t just keep around all addresses that you’ve seen ever. But rather tries to keep around a selection of good addresses and potential addresses to try in the future to connect to.
Speaker 0: 00:04:59
Right, and we’re not going to immediately try adders that are sent to us. A, because it’s quite expensive and B, it could be a perfectly good address, but they just happen to not be online right now.
Speaker 1: 00:05:13
Yeah, so over time as we have a need for new connections. We occasionally try connecting to addresses stored in the AdderMan. I think there’s actually a specific connection type for this, right? The feeler connections? Right.
Speaker 0: 00:05:32
Yeah, feelers. Yeah.
Disclosure of remote crash due to addr message spam
Speaker 1: 00:05:36
Anyway, this bug is concerned with some internals of the Adderman. So it has this internal count of how many addresses have been inserted into the AdderMan, like in total, and basically this variable is increased every time you insert an address. And it’s a 32-bit integer, which means at most you can insert 2 to the 32 adders into Adderman before this variable overflows. Now why is it bad if that variable overflows? Adderman has internal consistency checks, which basically all the state it has, it’ll make sure that it’s all consistent and doing the right thing. But if this ID count overflows, then the consistency checks fail. And the way Bitcoin Core handles the consistency checks failing is crashing the node. And I think you could have a discussion around if that’s the right approach, but I think with regard to like Eclipse attacks or whatever, it’s probably good that you just crash instead of continuing to run with like an inconsistent Adder database. But yeah, so essentially if you’re an attacker, you could spam a node with a bunch of addresses. Pretty much, you know, two to the 32 adders you would have to send, which is, I’m not sure how many gigabytes, but like multiple. I should probably know how many gigabytes. It’s like 10 to 20 gigabytes or something, I think, that you have to send. And then your victim would overflow its counter and crash.
Speaker 0: 00:07:16
So what’s, so this is, yeah, 2 to 32 is a really big number. How would you contrast that with what the expected amount of Adder gossip you’d receive?
Speaker 1: 00:07:28
Yeah, so If you establish a connection initially, you’ll ask your peer for addresses using a get adder message. So you send the get adder and then you get a bunch of addresses in return. I think that’s limited to a thousand addresses and that only happens once per connection at the start of the connection essentially. And then there’s also this like casual adder gossip, which if you receive addresses, you will relay some of them to a few of your peers. And I’m not sure what the average or expected number of this casual relay is. But prior to the fix for this bug, there was no rate limiting on this. So you could just spam as much as you want.
Speaker 0: 00:08:19
Well, I think there was rate limiting for how much we would forward. Yeah. But yeah, anything that was sent.
Speaker 1: 00:08:25
You would just process anything that you get.
Speaker 0: 00:08:28
Right. Right. And there were other things in place to try to balance between, to protect against just being spammed with malicious data, of course. But in terms of this number, we were okay with being sent a lot of adders.
Speaker 1: 00:08:46
Yeah.
Address spamming observed on the network
Speaker 1: 00:08:46
And I think what’s interesting is the fix basically introduced the rate limiting, but then once the PR was opened, we saw someone starting to spam addresses on the network.
Speaker 0: 00:08:57
Right. Yeah, yeah,
Speaker 1: 00:08:58
yeah. Which at the time, the people handling this issue weren’t sure if someone was trying to exploit the crash bug or if there’s some other behavior that they were trying to abuse. And they saw like, basically, we were wondering if the rate limiting PR triggered someone to abuse something that they would no longer be able to use after the PRs merged.
Speaker 0: 00:09:20
Right, I think I remember, was it, I think Martin was talking about it, right? And was it just like a coincidence? Like, wasn’t it Katsuo doing research?
Speaker 1: 00:09:33
Well, I think the conclusion was that someone was doing research and someone made a guess as to what it might be and it has something to do with estimating the number of actions a node has.
Speaker 0: 00:09:47
Right. Yeah. Yeah.
Speaker 1: 00:09:49
Like estimate the degree of a node.
Speaker 0: 00:09:53
Yeah. I think if you go to the KIT node stats website, they have all kinds of stuff like all of the adders that are reachable on this day or something like that. Or, you know, the number of adders that people are talking about that day. I’m not sure. But so, sorry, I still don’t understand. Was it a coincidence or was it?
Speaker 1: 00:10:18
I don’t think we know exactly who did it. We only have like a guess as to what it was for.
Speaker 0: 00:10:25
Okay.
Speaker 1: 00:10:25
But I don’t think there was ever any. So there was a small paper published by someone trying to make a guess. I think that was the KIT people. They were trying to guess what was happening. But the actual, whoever did it never released anything as far as we know.
Speaker 0: 00:10:44
Kaitlin Luna, Ph.D.: Nate, was it just somebody testing the PR?
Speaker 1: 00:10:47
It could also be. Well, maybe.
Speaker 0: 00:10:50
No, that seems like a weird way for someone to test it.
Speaker 1: 00:10:53
But yeah, that was kind of a fun fact about this fix.
Bitcoin Core #22387 PR to fix addr message spam
Speaker 0: 00:10:56
Okay. Do we want to talk more about the PR that fixed it?
Speaker 1: 00:11:01
Sure. I think you might be better to lead on that because I did not.
Speaker 0: 00:11:07
I think I remember us reviewing this PR actually, but it was like us together. Yeah. With what was it not? I mean,
Speaker 1: 00:11:14
it’s a long time ago. Yeah. I do think I remember looking at it.
Speaker 0: 00:11:19
Me too, but I don’t really remember much in it. It’s like a token bucket style rate limiting for adders that people send to us. So as Nicholas already enumerated, we have kind of solicited adders. When we connect, we’re like, oh, we’re friends now. Who are all your friends? We have kind of unsolicited, random, casual gossip, as you said, and then we also have self-advertisements. And I think that covers all of the Adder announcements that we’ll send, at least as an honest Bitcoin Core node.
Speaker 1: 00:11:59
Actually, in the PR description, Peter has an estimate for like expected or average address per second. And he says it seems to vary for like 0.005 to 0.025 addresses per second. Right. Which is obviously pretty low. So the PR puts a rate limit of 0.1 address per second, but it also allows a burst of a thousand addresses at once to allow for the get adder response.
Speaker 0: 00:12:37
Right, I think the token bucket stuff doesn’t start until that. So we do keep track of whether or not we’ve sent and received a response for a get adder. Right, so as any token bucket mechanism works, each peer is given a token, well, it’s just a counter. It’s just a counter and a map. An amount of tokens for adders you can send. Each time you send us an adder, we take tokens out of your bucket. When you’re out of tokens, any adders you send, we just drop on the floor. And more importantly, we don’t increment this 32-bit counter. And That’s it. I think it’s fairly simple, I think. JASON LENGSTORF Yeah. MARCEL MANCINO Okay. Should we move on to the next one?
Speaker 1: 00:13:27
JASON LENGSTORF Sounds good. MARCEL MANCINO Oh.
Speaker 0: 00:13:32
I think before we move on, we should do the credits.
Speaker 1: 00:13:35
Oh, yeah.
Speaker 0: 00:13:35
Just to make sure that we credit everyone. So credits to Eugene Siegel for discovering and then Peter Willa for fixing the bug.
Background on Miniupnp, the UPnP library used by Bitcoin Core
Speaker 0: 00:13:46
Why don’t we move on to the UPnP one, which if you listened to our last episode, is our favorite dependency of Bitcoin Core. Let’s see, so yeah, We gave some background in the last episode, but in order to receive inbound connections for a Bitcoin Core node, we have to set up port forwarding. And we have MiniUPNPC as a dependency to help do that automatically. It’s off by default because of the disclosure from last episode. But it’s today, even. It is a dependency of Bitcoin Core. So what needs to happen before you set up port forwarding is you have these two devices, you have your router and you have your Bitcoin Core node that are on the same local network, and they need to talk to each other to say, hey, can you set up, you know, port forwarding for 8333 for me? Before that, they need to discover each other. So, I think part of the UPnP stuff is, what will happen is they’ll send out an mSearch announcement, and then devices on the network will see that announcement and reply with like, hey, I’m this device, here’s my something number and some information about me. And then hopefully that’s how we discover the router and then we’ll be like, okay, I’m talking to you, please do this for me.
The bug in Miniupnpc
Speaker 0: 00:15:18
However, there was a bug in mini-UPnPC where this loop, there’s this loop where it listened for responses to the msearch and it would just add new devices to its list of devices. And I think it’s like the way the loop is written, it kind of like assumed that it would like stop, and then you could, it was assumed that the loop would terminate. But I think…
Speaker 1: 00:15:45
I think the assumption was that you stop receiving data at which point the loop exits.
Speaker 0: 00:15:51
Right. Yeah. Or I think I want to say it’s on some kind of time interval as well. So it was like, we’re not going to receive a million in one second. I can’t remember. Anyway, someone upstream discovered that this can loop infinitely. I think they plugged in like a mouse or something that was just very, very aggressive in broadcasting their device info or something. And they discovered that this looped infinitely.
Disclosure of the impact of an infinite loop bug in the miniupnp dependency
Speaker 0: 00:16:31
And so, Michael Ford, Vanquick, looked at, he was monitoring upstream where this bug was discovered and fixed and was like, okay, the implication for Bitcoin Core here is when Bitcoin Core kind of does this, if there’s a local device on the network, any local device, that does kind of the same behavior where it’s just like spamming these msearch responses, then the Bitcoin Core, because it uses this library, will also go in this infinite loop. And within this infinite loop, each time you add a device to your list, you’re going to allocate some space on the heap to store the strings and whatnot, the data that was in the reply. And so I think FanQuake wrote a script to do this with like unique numbers and like a very large string, something like that. And then eventually you would oom within this infinite loop where you’re continuously allocating space. And so put this all together, upstream vulnerability means that your node may crash if there’s something on the local network that is doing something like this.
Bitcoin Core #20421 PR to fix the infinite loop bug in the miniupnp dependency
Speaker 0: 00:17:49
So yeah, the fix, of course, because this was fixed upstream, was to update the dependency to use the fixed version of it, which Netquick also did. And that’s it. Anything to add?
Speaker 1: 00:18:05
Well, I guess we can give credits to Ronald Hoveniers, who initially discovered the infinite bug and reported it to the Mini-UPNP Project. Yes. I hope I pronounced the name correctly. And then Michael for realizing what the impact on Bitcoin Core is and providing a proof of concept.
Speaker 0: 00:18:25
Right. So thanks. Thanks to everyone who did that. All right. Is that the end of this episode?
Speaker 1: 00:18:35
I think so. Yeah. Did we learn anything?
Lessons learned
Speaker 0: 00:18:47
So I think following on the themes of last episode, I remember we talked a lot about kind of lower hanging fruit, such as, oh, like there’s no bounds on this data structure. And I think the rate limiting here, it’s not really low hanging fruit, but it is along those lines of, at least I remember when I was thinking about orphanage for my current project, like kind of a very similar idea where we have this data structure and like we want to make sure honest peers use it, but we’re also fully aware of there potentially being malicious peers that send completely unverifiable data. Like we have no idea whether this is valid data or not. We just have to kind of allow it where the solution is to kind of have this token bucket style per peer rate limiting. And that’s a technique that I learned from this.
Speaker 1: 00:19:56
I see. I see where that came from then.
Speaker 0: 00:20:02
Your turn.
Speaker 1: 00:20:03
Yeah. I get like, I don’t know. I think for our dependencies, we could probably do a better job of trying to keep our dependencies up to the standard that we expect from the code that we write ourselves. Yeah. But yeah, I don’t think anybody has really reviewed Mini-UPNPC, like from our side.
Speaker 0: 00:20:26
Well, Fanquick has. I’m just kidding. It’s a joke.
Speaker 1: 00:20:29
But… Well, I guess he found the issue.
Speaker 0: 00:20:33
Yeah, so I asked him before we did this, how did you, are you just subscribed to everything? And he says, well I just kind of keep tabs where I periodically look at them. And he’ll kind of flag slash complain every once in a while that certain dependencies are exhibiting some not great behavior. Such, I mean, like this morning he was complaining that someone was opening all these refactoring PRs in LibEvent, right? And I guess it kind of depends on what the maintenance slash contribution style or culture of the dependencies. But yeah, I guess all we can do is minimize.
Speaker 1: 00:21:28
Yeah, minimize or sort of maintain them ourselves.
Speaker 0: 00:21:32
Right. Oh, so we have a few, we can talk about, like we have some dependencies that we’ve subtreed. Right, so there’s like this big library, it’s like a hundred thousand lines of code, and we haven’t reviewed all of it. We’re not really sure as to how…
Speaker 1: 00:21:48
Which library?
Speaker 0: 00:21:50
I’m thinking of…
Speaker 1: 00:21:52
I’m thinking of Boost. Oh, Boost, yeah.
Speaker 0: 00:21:56
But also, isn’t Univalue something that we subtreed?
Speaker 1: 00:21:59
Yes.
Speaker 0: 00:21:59
Right, And univalue is part of a larger, no?
Speaker 1: 00:22:03
But I think we’ve, well, I don’t know. I’m not too sure about this. But I think we’ve kind of taken over the univalue. We’ve made some changes that are not upstream. So I think our subtree sort of just became, it’s like integrated sort of. It’s just our code now essentially.
Speaker 0: 00:22:21
Right, right. Which I think is like other people would hate. Like speaking of, we were talking about this earlier, it’s like I was looking, I was browsing kind of engineering principles documents of various companies, where they’re like, you know, move fast and break things. And like, we deliver value to the users and stuff. And one of the ways that one of the biggest things we’ll talk about is like don’t rewrite things like don’t reinvent things that already exist. Just like build upon stuff that’s already there because we’re not trying to we’re not trying you know you don’t waste time basically on things.
Speaker 1: 00:23:01
I mean, yeah, if you’re moving fast, or if that’s your goal, then that obviously makes sense.
Speaker 0: 00:23:06
Yeah. And then here we are, where we’re like, yeah, no. Like, we took something that existed. We decided to make a copy of it and put it in our tree. And now we’ve like made our own changes and like customized it.
Speaker 1: 00:23:25
Which also, I’m not sure if it’s true for Univalue, but it might, like we might just be forced to do it because upstream it’s unmaintained. We need to make certain changes. Like maybe we want to upgrade to a new C++ version and the dependency needs to move and there’s no maintainer. So we just absorb the dependency essentially and make it our own thing.
Speaker 0: 00:23:49
Yeah. Which I think is, again, I’m just saying that it’s quite rare.
Speaker 1: 00:23:54
I think.
Speaker 0: 00:23:56
Yeah. Welcome to security-oriented engineering. All right. Thank you for listening. We’ll be back again in a month for the next batch of disclosures.
Speaker 1: 00:24:08
If you have any questions, let us know on Twitter or X.
Speaker 0: 00:24:13
Okay. Bye. Bye.