Discussing Pre-25.0 Bitcoin Core Vulnerability Disclosures
Speakers: 0xB10C, Niklas Gögge, Gloria Zhao
Date: October 10, 2024
Transcript By: kouloumos via tstbtc v1.0.0 --needs-review
Tags: Bitcoin core, P2p, Compact block relay
Introduction
Speaker 0: 00:00:00
Hello there, welcome or welcome back to the Brink podcast. Today we’re talking about some security advisories again that were released a few days ago. I have with me B10C and Niklas. Feel free to say hi.
Speaker 1: 00:00:15
Hey Gloria, hey Niklas. Hello.
Speaker 0: 00:00:18
Great. So on the agenda today, we’ve got three vulnerabilities. A good mix, but all peer-to-peer. So first we’re going to talk about the headers precinct bug, which was actually released last month. And then we’re going to talk about the inf send queue set bug and then the compact block TXN crash. Does that sound good with you guys?
Speaker 2: 00:00:44
Sounds good.
Speaker 0: 00:00:45
Great.
The DoS vulnerability in headers sync
Speaker 0: 00:00:47
So headers precinct, This is a vulnerability that’s been known for quite a long time, but I think what changed was kind of how easy we thought the attack would be to pull off and our philosophies around how we should patch it. So the general problem statement would be, okay, you’re a new node, you’re joining the network, you’re connecting to all these peers that are anonymous, and you need to sync the main chain. You need to do initial block download or IBD, but that involves of course downloading hundreds of gigabytes of data at best to go and validate and figure out where the main chain is. And so you have to consider the possibility that these are malicious peers, you don’t know who they are, and they could just send you hundreds of gigabytes of data that is total garbage and doesn’t actually lead to the Bitcoin main chain is some other chain or it’s just garbage, right? And the nice way to prevent, like a very natural way to prevent spam in this situation is to use a proof of work metric because it’s very difficult to fake blocks with high proof of work as, you know, as our consensus mechanism. And so since 0.8, which was quite a few years ago, we’ve been doing headers first sync. This is different from headers pre-sync, which is what we’re gonna be talking about, but this we’ve been doing for a long time, where we’ll first download the 80 byte headers chain and figure out, okay, this one is the most proof of work. Now that I’ve figured out what the chain is from the 70 to 80 megabytes worth of data. I have that in memory, and now I’m gonna go and start downloading those blocks and the blocks I will write to disk. However, even with this in place, you can imagine that somebody is sending you, and we have hundreds of thousands of headers in the main chain, like the actual chain is about 70 to 80 megabytes again, but I could send you lots and lots of headers that don’t actually go anywhere, right? Like, it looks like maybe the difficulty is getting kind of high, but it never quite gets to the point where it is the main chain.
Discussion of checkpoints in the code
Speaker 0: 00:03:08
And so traditionally, or for a long time, we’ve had checkpoints in place where in the code, hard-coded are the block hashes of blocks at certain heights. And so you can’t send, like, you know, billions of headers at no difficulty. And you know, that would be many, many gigabytes that would cause a node to run out of memory. So typically, when you’re using too much memory, like the OS will shut down your process or you’ll crash, it’s bad. When that happens to you as a user, basically, your node’s dead, and you have to go and manually restart it. But of course, you can imagine if somebody is able to pull off an attack like this, again, they’re sending you billions of headers and you’re running out of memory while you’re trying to do initial block download, then just over and over again, you’re trying to start your node, it’s syncing a little bit, and then it crashes. And so you’re basically locked out of the network. You can’t do Bitcoin. So Checkpoints has been a solution that A, we think is not as good as it should be, given kind of the cost of pulling off this attack, I’ll get into that a little bit more, but also philosophically, a lot of people disagree with kind of the idea that there are hard-coded consensus rules effectively that come from developers. And the last checkpoint was added, I think, 11 or 12 years ago. So it’s been a really long time since we’ve considered adding new checkpoints. However, the old checkpoints have remained in the code base and we’d like to delete them. But also, because the last checkpoint is so long ago, it is still feasible for an attacker to create the chain that starts with all the checkpoints in place and then grinds these headers to ramp down the difficulty and then produce lots and lots of headers after that. And this is not easy, you can’t just do this on your laptop. However, we can compare this to the amount of work that it would take to mine a block today, because it’s quite proportional. You can get a sense of how much hash rate you need to produce a block today, versus how much hash rate you need to produce this, attacky headers chain. And I don’t remember the exact numbers, but I think in 2022, when this attack was being talked about again, they were saying, oh, it’s about the 14% of the hash rate you would need to mine a block. And you know, mining a block is not cheap, but this is also somewhat affordable. And so the idea was like, okay, let’s try to fix this properly. That’s the problem statement. I’m going to pause here to see if Niklas or V10C have any comments.
Speaker 1: 00:06:12
I think one thing to add is that over time and has since 2.22, it has become easier or even easier to pull off the attack. Yeah, so good thing that this is fixed now.
Speaker 0: 00:06:26
Yeah, definitely good to get ahead of it. Because I think I saw a number that was this year, it went to like 4% of mining a block, which again is not veep, but it’s a lot cheaper than you would hope for this kind of pretty severe attack that can crash anybody doing IBD. Well, you can,
Speaker 2: 00:06:47
you can crash post-IBD as well because you can, you can create the chain that forks off after the last checkpoint at any point in time and just feed it to a node and then it’ll run out of memory. So it is not just during IBD. Although I think, I think, yeah, I think the IBD reasoning was sort of publicly known already and then the fact that you can do it after IBD as well was, maybe more the secret part.
Speaker 0: 00:07:17
Ah, wow. So everyone on the network could have, like, if, you know, somebody went and did the work to produce this chain and started sending it out, I guess you’d have to individually connect to everybody.
Speaker 2: 00:07:30
Yes. And send them out. Yeah. So you would have, yeah, you have to connect to everyone and send them gigabytes of data individually.
Speaker 0: 00:07:39
Okay.
Speaker 2: 00:07:40
But you only have to mine the chain once, but you can send the same stuff to everyone.
Speaker 0: 00:07:44
Right. Okay. That’s interesting. So even though, right, even though we have what is the most work chain and they’re not going to produce a chain that has more work than the main chain that we’ve already synced to, however, we are going to download those headers from them just to see if it ends up being more work than our current chain.
Speaker 2: 00:08:06
Yeah, I think I’m not sure if we, so I think the way it would work as the attacker, you would just keep sending the headers. I’m not sure if the node would actually ask for the headers. It might, but in any case, you could just feed the headers and we would, you know, the node would validate and store them if they are.
Speaker 0: 00:08:23
Interesting. So we allow unsolicited headers?
Speaker 2: 00:08:27
I think so. Yeah. Yes. I’m pretty sure because as part of the block announcements, all the headers can also be block announcements. So you do just validate them if you get them. Okay, wow. And you know, in case of a really large reorg, you would want to accept those headers as well. So.
Speaker 0: 00:08:44
Right, right.
Speaker 2: 00:08:46
Yeah, The node should be curious about alternative chains, but obviously if you can be misled into downloading a bunch of garbage, that’s not great.
Speaker 0: 00:08:54
Right. Yeah. It seems like I just threw out some kind of like naive straw man solutions basically, which is like, oh, you know, maybe don’t download too many or only require, you know, them to be solicited. But those are against our other trade, like those are bad trade-offs against our other goals, which is we really want to hear about headers and we want to hear about them quickly. Yeah. Like philosophically you want to switch to, you know, even if it’s a large reorg, philosophically we want to be able to switch to a different chain. Right.
Speaker 2: 00:09:32
If it ends up having more work.
Speaker 0: 00:09:34
Right. Theoretically, if we’re on a 800, 000 header chain, and there is a legitimate billion header chain out there somewhere, theoretically, our consensus slash peer-to-peer code should allow us to switch. Okay, but this is a difficult problem. Should I talk about the PR then to fix it? Because I remember we all reviewed it together actually.
Speaker 2: 00:10:00
Yes, I reviewed it. It is a while ago.
Bitcoin Core #25717 PR to fix the DoS vulnerability in headers sync
Speaker 0: 00:10:03
So yeah, I think I see on the details, but I didn’t review it at the time Yeah, okay, so to recap we have We’ll download all the headers and we we don’t look at blocks until we figured out our headers chain. However, somebody could send us a lot of headers that don’t actually go anywhere. So this PR, which is by Sue Haas and Peter, I think, proposes a precinct stage, a headers precinct stage, where basically what you’re gonna do is download all of the headers from somebody twice. The first pass is to say like, okay, you’re sending me these headers, I’m not gonna keep them, but I just wanna see where you’re going with this. And of course, the headers refer to the previous block hash. So it’s a continuous chain. And then once we get to a point where I think, okay, yeah, this is the most work headers chain, but I’ve forgotten about it. So now I’m gonna re-download it from you. And this time I’m actually going to keep them. Now, of course, the glaring question there is, okay, what if they just send you a different chain the second time? So in the first phase, what you’ll do is store a very, very small commitment to it. I think it’s about one bit for every 580 something headers. So extremely, extremely compact, even smaller than, again, our normal headers chain is about 70 to 80 megabytes. But of course, even if someone sends you like billions of headers in this case, it’s still gonna be a reasonable memory bound. And so while this does take two passes, you’ll notice that if you started a node that’s like, I don’t know, version 20, it would have said pre-syncing headers. Sorry, not pre-syncing. It would say syncing headers dot dot dot. And then now if you start a node that’s more modern, it’ll say pre-syncing headers dot dot dot, and then syncing headers dot dot dot, because you’re doing it twice. However, even though we’re doing it twice, it all takes less than a minute. At least the last time I checked, it was very, very fast. So yeah, that’s the approach.
Speaker 2: 00:12:18
Yeah, it’s kind of sad that it, like downloading twice is the only way to do it. It would, it would be nice if the protocol somehow allowed for proving that, you know, like proving that you have a header that has enough work. But yeah, I don’t know. It seems like the only way of currently doing it is downloading twice.
Speaker 0: 00:12:36
Right, like if we could create these really compact proofs, like here I’m gonna send you block 860, 000 or whatever first and it has this much work.
Speaker 2: 00:12:47
Yeah. I mean, yeah. I’d imagine there’s some fancy, you know, way of doing it, but you would probably have to significantly change the protocol.
Speaker 0: 00:12:57
Right. Maybe it would be even bigger than 79 bytes And cost even more compute.
Speaker 2: 00:13:05
I mean, I guess it hasn’t someone already made like a zero knowledge header sync.
Speaker 1: 00:13:11
Right. There was something there. Yes. Yeah.
Speaker 2: 00:13:13
Yeah. Like that would maybe sort of go in that direction. Right. But obviously that’s-
Speaker 1: 00:13:17
I’m not sure if that solves the same problem, but yeah. Right.
Speaker 0: 00:13:23
Okay, cool. Anything to add or should we move on to the end send?
Speaker 1: 00:13:28
I think that- I think there was one question before. Is this like on the memory only or is this a disk space issue?
Speaker 0: 00:13:37
It’s on memory.
Speaker 1: 00:13:39
Memory only, right. I think the reason for this was that we don’t store the headers or blocks at all to disk before we, before we flushed and we don’t flush during, during header sync.
Speaker 2: 00:13:51
Uh-huh. Yeah. I’m actually, I’m not sure. So we do, yeah, we do store the headers on disk at some point.
Speaker 1: 00:13:57
Okay. But I’m not,
Speaker 2: 00:13:59
yeah, I’m not sure exactly when that happens. I think, I think it definitely happens during the regular flush where you also flush chain state and all the other rest, but I’m not sure if there’s like a flush right after the initial header sync or something, but in any case, you would run out of memory before you run out of disk space.
Speaker 1: 00:14:16
Yeah.
Speaker 2: 00:14:18
Yeah.
Speaker 1: 00:14:18
Well, at least on a typical machine, you will probably have more disk space and memory. Yes.
Speaker 0: 00:14:26
Cool.
Speaker 1: 00:14:27
All right.
Speaker 0: 00:14:27
Boom. Move on.
Speaker 1: 00:14:29
Okay.
The denial-of-service (DoS) vulnerability in inventory send queue
Speaker 1: 00:14:29
So, yeah, let’s talk a bit about the denial of service to the info of send sets growing too large. And I think it makes sense to start with a brief recap of transaction relay here.
P2P background regarding transaction relay and inventory messages
Speaker 1: 00:14:42
So in the Bitcoin protocol, we currently have the inf getHRNT extends. So that means when we have a transaction that we think another peer doesn’t have yet, we announce it in an inventory message to the peer and the peer might be spawned with the getHR asking for the transaction. And then we end up sending a transaction. And in Bitconcord in the implementation for each peer we maintain a set of transaction that we might want to announce to the peer. And every time we receive a better transaction, we add it to the, to the sets of all the peers that want to receive transaction from us. Yeah, we don’t flood the peers with large infs, but we rather trickle the transactions out by, by yeah, every few seconds building an inf message and draining some transactions from the set. And before we drain or while we drain some of the transactions, we also skip and remove transactions from the set that aren’t in the mempool anymore, because obviously we don’t need to announce them to the peer. Or transactions that the peer already knows about. For example, because they already told us about them. So we know they are aware of the transaction. We don’t need to tell them about. Yeah. And also transactions that the peer said, hey, I don’t need transactions below this fee rate, below the fee filter. Yeah. And before Bitcoin Core version 25, where the debug was fixed, we trained at a rate of seven transactions per second. I think that’s for inbound peers. I mean, it might be different for outbound peers. I’m not too sure. Maybe Niklas, you know more, or Gloria.
Speaker 0: 00:16:27
It’s a bit more for outbound peers.
Speaker 1: 00:16:29
Okay. Yeah. I had something like this in mind, but I’m not so sure. Yeah. And each time an INF is constructed, we first sort the sets. We order them by fee rate and transaction dependencies. So Generally higher fee rate transactions are drained from the set earlier. And this has two benefits. Obviously the first one being that higher fee rate transactions are more likely to be included in the next block. So We want to relay them faster and also that we don’t want to reveal the order that we learned about these transactions to a spy node that’s connecting to us.
Speaker 0: 00:17:11
Also replacements. So if a transaction gets replaced a few times, you might as well send the final transaction instead of the earlier ones.
Speaker 1: 00:17:20
Right, correct. Sorry, go ahead. Yeah, no worries.
Observations of increased network activity
Speaker 1: 00:17:25
And in early May 2003, sorry, we noticed some increased network activity that caused these assets to grow faster than they were being drained. So around early May, 2023, we noticed some increased network activity that caused these sets to grow faster than they’re being drained. The sets are like an unbounded memory structure, a data structure, and they are drained at some rate, but filled in at another rate. So if they’re filled faster than they’re retrained, it could be a problem. And at the time, I think this was related to many small one input, one output transactions, so a large number of transactions, but already small. I think it was related to BRC 20 documents. I’m not too familiar with that. But looking at the chain, we see in March of 23, we see about 250, 000 transactions per day in March and close to double the number of these transactions in May. So definitely a higher number of network activity and more transactions being sent around and being relayed. And that’s only what ended up being in the blocks. But yeah, this means the sets grew larger, and we spent more time sorting them. And this also was amplified by peers that never announced interactions. As I mentioned before, we can skip and remove transactions, appear announced to us before we put them in the inf because they already know them. And when they don’t announce any, obviously the sets stay much larger for much longer. And there are some peers that don’t announce any IMPs to us on the network, but want to receive transactions. And these are usually called spy notes. Not all of them are spy notes, but most of them are. And they just listen for imps, but never announce any to us. And some of them are run by companies, others by universities that do network monitoring or similar. Yeah, and every listening node probably has a few of these spy nodes as inbound connections.
Speaker 0: 00:19:54
And in this case, the spy nodes likely contributed to an increase in time spending these sets because because Just their sets of transactions that we want to announce to them are much larger. And this ended up being noticeable on the network. For example, one of my monitoring nodes, I recall, had around 190 inbound connections. I had increased the maximum inbound connection number and it dropped to like 35 connections. So there was definitely a problem here and looking into it, it turned out that the majority of the time in the P2P communication thread was actually being spent sorting the sets and not handling the communication with other peers. And this went to a point where we couldn’t really keep up with the other peers anymore, or maybe even the other side, the peers couldn’t keep up with us setting the data and the pings timed out at some point and we closed the connections. And the effects on the network were, yeah, not ideal. Obviously, we want to keep up with the network as a node. We want to relay transaction, we want to relay blocks, especially for protocols building on top of it as well and mining pools and similar. And I think we also heard from the mining pool that they had problems with it. And we saw like nearly daily stale blocks where normally we see only stale blocks a few times, three times a month. Another interesting metric I found at the time was that the KIT research group is comparing ICMP pings, the normal pings, if you type ping in a comment line, to the Bitcoin pings. And the ICMP pings, they were normal. The servers, the nodes were running on were fine, but the median Bitcoin ping skyrocketed, so that means clearly the servers is okay, but the Bitcoin process has a problem. Or in this case, the communication thread was hogging, I think, 100% of the CPU core or thread available to it. And there is one caveat with this data here that the KIT people run monitoring nodes and they don’t announce transactions to us. So that might have been part of the problem and their measurements are probably a bit skewed here. But I think in general, that’s an interesting metric to keep in mind because it means there’s a problem with the Bitcoin network. And there we have some quick fixes available. Obviously on the local side you could restart your node to reset the inbound connections and also reset the infsets. I noticed I did this but this also made debugging the problem harder. Obviously, it takes a while for the inbound connection slots to fill up again and for the M sets to get large again, to see the problem again. Spinal bandless might also have helped to some degree. And I worked on one, but I never ended up publishing it, but ultimately it’s a network problem. So there needed to be a fix. And I recall it was shortly before the 25 release. So we were doing release candidates at the time and was a good time to, or it wasn’t okay time to, to a last minute fix and AJ Towns, Anthony Towns, AJ, looked deeper into this and worked on a fix.
Bitcoin Core #27610 PR to fix the inventory send queue DoS vulnerability
Speaker 0: 00:23:37
So credits to him for fixing this. And he opened a PR called improved performance of P2P infrastructure and queues with a fix with two commits. One of the commits improves the sorting order of transactions that aren’t in the mempool anymore. So before we sorted them, the transactions that are in the mempool anymore to the end. But now we sort them to the beginning. That means when we are constructing an inf, we can throw them out faster. And the next time we saw the infs, the set is much smaller. And the section commit actually changed the static train rate from 7 transactions per second from here for inbound again to a dynamic rate based on the size of the AMPs. So larger sets mean more transactions than ultimate pair in just draining them faster if the situation arises again, which would help. And yeah, AJ’s PR got merged in time before the 25 release And since then we don’t have seen this problem again.
Speaker 1: 00:24:48
Right. So basically it was just from an abstract point of view, you have these massive sets, you’re sorting all of them. And then we had a filtering mechanism after sorting them. Like We’d sort everything, take the top 35, or dropping things if they didn’t meet our filters like they already knew about it or we don’t have it in mempool anymore or whatever. But really what we should be doing is trying to drop as many as possible so that the next one. As the ODS sort. Yeah. Yeah. Right. So swapping the order for sorting helped a lot. And then the dynamic send rate also helps drain things faster without having privacy problems.
Speaker 0: 00:25:32
Yep.
Speaker 1: 00:25:32
Cool.
Stale blocks and impact on miners
Speaker 1: 00:25:34
I, so I wanted to clarify something about, you were talking about stale blocks. So my understanding is, if this is the message processing thread, it’s blocked while you’re sorting your in send queue. So you can’t be looking, you can’t be receiving blocks around this time. You’re just spending like 99% of your time sorting. And if that operation takes, you know, 30 seconds or I don’t know exactly how much time it was, but a long time, it gets, it takes you a lot longer to get to the block message that you got from another peer or from that peer.
Speaker 0: 00:26:16
That’s, yeah, that’s, I think that’s one part of it. The other part of it is transaction relay is impacted by this. So mining pools might not have a transaction yet and need to do a compact block request or need to request the transactions because they don’t have them yet. So that takes longer. So more time to build a stain block. And also I think in general, the nodes of the mining pools were in general having problems receiving in general data about new blocks and so on, processing them and making them available for their pool.
Speaker 1: 00:26:54
That’s surprising to me, because I would expect that the node in charge of producing block templates for a mining pool. I would have expected that it either doesn’t participate in Relay, except with its own node.
Speaker 0: 00:27:11
Yeah, I see. Like one peer? Yeah, might be in an inferior group of, like, connected, directly connected peers, but not publicly connected peers, yeah. Interesting. Yeah, but good point, yeah. Maybe the actual node not, but like it couldn’t reach any, like in a timely manner, blocks and selection couldn’t really reach the mining pool node.
Speaker 1: 00:27:34
Right. Everyone it’s talking to is spending all its time sorting these sets. Yeah, that makes sense. Cool. Anything else to add?
Speaker 2: 00:27:42
You first realized this because you were looking at your own monitoring tools, right? Correct. Yes. Did you, did that send you a notification or are you just checking it every day and look if there’s something weird going on?
Speaker 0: 00:27:58
So as a result of this, I added a notification and there is a big drop in peers or in generally connectivity. But at the time, yeah, I think it was managed checking. And also, like there were multiple problems at the time and I recall this, I don’t know, I was looking at this and we were discussing if this was related to other problems or not. And this was prompted, what prompted me to look deeper into it. But yeah, a drop in inbound connections this big is definitely worth looking into.
KIT Bitcoin monitoring website and latency graph
Speaker 2: 00:28:30
And I think it’s cool that you mentioned the KIT website and the ping metric because I would have brought that up as well. And I think you can, I mean, you can still go to a website and then look back at the charts like the ping time went up extremely, the transaction propagation times went up, the block propagation times went up? Yeah, it’s a pretty cool website. And you could see the effects of this very clearly.
Speaker 0: 00:28:53
Yeah, I’m not sure if you do show notes, but you probably could put a link in there.
Speaker 1: 00:28:57
Yeah. Yeah. I think another interesting chart that people can look at is, you know how mempool space does a vbytes per second metric? So it’s like pretty steady, I think, for the entire graph, except for May 2023, where it goes from like in the tens to like in the tens of thousands or something crazy like that. It doesn’t show this attack in particular, but I think it does show you the like circumstances that led to this, Sorry, not attack, but like this situation.
Speaker 0: 00:29:32
Yeah, I think at least from my perspective, I don’t classify it as an attack. I would say more, we were actually quite lucky to observe this this way, coming not from an attack, but from like in general network activity. An attack might be a bit more problematic.
Speaker 1: 00:29:54
Right, but so maybe let’s explore how much it would cost to pull off an attack like this on purpose, because I think it’s kind of hard to synthetically produce this many transactions because you have to pay a lot in fees.
Speaker 0: 00:30:08
It’s definitely not a cheap attack.
Speaker 1: 00:30:11
Yeah, but I guess you could recreate it on testnet.
Speaker 0: 00:30:17
Or warn it, I think. I think the warn it idea was born from this actually.
Speaker 1: 00:30:22
Oh, cool. Has it already been created?
Speaker 0: 00:30:26
I’m not up to date on this, but yeah, we should check with them.
Speaker 1: 00:30:30
Why don’t you actually, do you want to describe Warnet real quick for the people who don’t know?
Speaker 0: 00:30:35
So the idea behind Warnet is to have a network of actual nodes in simulation fashion on rec tests where we can, or researchers or developers can replay some of the scenarios in like a more fashion. Like you have an attacker and it’s attacking the network and you can actually see if, for example, a fix helps to mitigate an issue. You might also be able to find new issues and in general, like a research tool for an input simulation.
Speaker 1: 00:31:08
Cool.
Discussion of disclosure approach
Speaker 1: 00:31:09
Great.
Speaker 2: 00:31:10
I think I have one more thing. My memory of this might be faulty, but I think At the time we didn’t really urge people to upgrade in response to this. Right. Like we didn’t make it public right away. Correct. Yeah. Which I think since it was already happening, like the behavior was already being triggered, we should have, you know, been more vocal about it. I think. Yeah, I agree. Just like the thing was already happening. It can’t really get much worse. So yeah.
Speaker 0: 00:31:38
I, yeah, I’m not too sure if, if we really happened the situation perfectly.
Speaker 2: 00:31:43
But I mean, I’m not predicting anyone in particular. Ultimately no one, I don’t know, felt the agency to do it.
Speaker 0: 00:31:53
I’m just reflecting on if that was. In hindsight. Yeah. It could have been louder in saying, hey, upgrade, But the fix wasn’t out yet. Maybe after the fix was out.
Speaker 2: 00:32:04
Yeah. I mean, like, you know, fix it and then like just say what’s happening and please upgrade because it’s already happening.
Speaker 1: 00:32:11
I do remember we cut 24.1 and 23.2 and then 25.0 with this fix in quick succession. I remember it all happened in like a week or two. Yeah. Which was, I was quite impressed. Like everyone was like Bitcoin Core moves so slowly. I’m like, well, three releases in like a week is not, not that bad in response to, I mean, but yeah, I think people were learning to restart their nodes and that kind of helped them get back on their feet again. And I also wasn’t sure, I remember telling people they should upgrade with the new releases, but I wasn’t telling them why. I think because none of us felt the agency to like, or not agency, but like, it’s kind of sensitive, right? To be like, hey, there’s a bug. And this was solved quite in the open. Like Ajax PR is called improved performance of TXN. And it’s backcourted.
Speaker 0: 00:33:06
And I think we also told a lot of people about, hey, there is a problem. Yeah. Please, please don’t like developers, be aware, please don’t open an issue or whatever. There is one, yeah, or there is a PR, please review this PR. So definitely wasn’t too secretive about it.
Speaker 1: 00:33:26
Right, I remember it was, sorry.
Speaker 2: 00:33:28
But also not as open as it maybe should have been.
Speaker 0: 00:33:30
Correct, yeah. And briefly going back to the timeline, you mentioned the timeline. So in the disclosure we find that it was first observed early May, so 2nd of May, and the fix was already merged on the 11th of May and 25 released on the 18th. So in under two weeks, the fix for it was merged. Obviously, we saw the network effects already. So there was some urgency to get it in 2025.
Speaker 1: 00:34:04
And then if we want to move on, I guess, Niklas, whenever you’re ready.
The crash vulnerability in compact block relay
Speaker 2: 00:34:10
Okay. Well, let’s move on to the last bug we’re going to be discussing, which is a crash bug in the block TXN message handling logic.
Compact block relay background
Speaker 2: 00:34:20
And I guess it makes sense to give some context on compact block relay for this one. So compact block relay was introduced, I think like 2014 or 2015. And it’s essentially a way of, a low bandwidth way of relaying blocks around the network and the way this works is that peers will announce a compact block to appear if a new block is seen. And the compact block includes pre-filled transactions, although currently the only pre-filled transaction is usually the Coinbase. And also a list of short transaction identifiers indicating which transactions are contained in this block. And then the receiving node, will try to reconstruct the full block just from the pre-filled transactions and looking at its mempool, comparing it to the short IDs that were announced. And if it can reconstruct, you know, it has the full block and can validate it immediately. So you’ve relayed the block at, you know, very low bandwidth cost. If the node is still missing transactions, it will request the missing transactions using a get block TXN message, which basically just includes the short transaction IDs of the transactions that it is missing. And in response, a block TXN message will be sent, which includes the missing transactions or ideally, you know, it’s expected to include the missing transactions. So upon receiving the block TXN message with the missing transactions, the block again will try to reconstruct given, you know, the transactions in the main pool that now receive missing transactions, the pre-filled transaction and various other transactions that are lying around. Now, since the Compact Block Protocol uses short transaction IDs, there is a potential for these IDs to collide. So transaction A has the same short ID as transaction B and the chance is quite low. I’m not really sure what the chances, but it is quite low, but nonetheless, we have code for handling the collision case. And the way the collision is detected is that if the set of transaction, the set of reconstructed transactions, if the Merkle root of that does not match the Merkle root in the announced header, then there’s likely been a collision and the way it’s handled is by falling back to requesting the full block. Right. So this is sort of the background. I think that’s probably all that’s needed to understand the actual bug. So a bit more on the technical details. When receiving a compact block, a node will initialize a data structure called partially downloaded block, which essentially tracks or keeps the state for this like compact block exchange. So when receiving a compact block, an instance of this is initialized. And then when you receive the block TXN, if you’ve asked for missing transactions, it will use that data structure to try to reconstruct using a method on this data structure called fill block. And the fill block is still, but was, was expected to only ever be called once. And this was documented in this method using a assert statement. Now the problem is if you hit the collision case logic, the, this instant of a partially downloaded block will not be wiped and the underlying data for the block request will also persist. So this leaves room for a second block.txt end message to be received and processed, which will cause fill block to be called again on the same instance. And since we have that assert, the node will just crash.
Speaker 1: 00:38:09
So you can be working on multiple partially downloaded blocks at the same time?
Speaker 2: 00:38:13
Yes. So you’ll have a partially downloaded block per peer.
Speaker 1: 00:38:17
Okay. And.
Speaker 2: 00:38:18
Yeah. So it’s, it will be the same peer that sends you twice, sends you two block TXN messages.
Speaker 1: 00:38:24
All right. So you have two blocks that you’re working on with the same peer?
Speaker 2: 00:38:30
No, no, no, sorry. So it’s, it’s only one block. And yeah, so the peer announces the compact block to you, you ask for the missing transactions, it will send you the block TXN and I’ll get into how the actual attack would work, but let’s just assume there’s a collision. In that case, you ask for the full block and in the normal protocol, your peer would respond with the full block. But since we’re not wiping the state for the compact block download at this stage, in theory the peer could send another block.txt end message for the same block again, and that would cause a cert to be hit.
Speaker 1: 00:39:05
I see, so they’re just responding to you twice.
Speaker 2: 00:39:07
Exactly, but in the normal protocol flow, this wouldn’t happen. So like you have like SC attacker, like this wouldn’t randomly happen. Like to cause the crash, you would need to, you know, not act according to the protocol.
Speaker 1: 00:39:18
Is it as simple as sending it twice? Is there no other like logic that says, hey, if a peer sends me to the same twice, they’re violating protocol, so I disconnect them? them? What does BIP 152 say?
Speaker 0: 00:39:33
I’m not sure what the BIP actually says, but yeah, there’s no, there’s no extra logic for this. And it is actually as simple as sending it twice. So there was, there was a functional test. Well, I’m pretty sure it still exists. There’s like a functional test where you duplicate one line of sending a block TXN twice, it like that functional test will find this crash.
Mechanics of a potential attack
Speaker 0: 00:39:55
But yeah, so in an actual attack, the attacker does not need to get lucky with the collisions Because the collisions are detected by checking the Merkle route, all the attacker needs to do is include or omit transactions from the block TXN such that the Merkle routes won’t match. So that’s what causes the collision logic to run. And then you can just send the second block TXN again, but can be the same block TXN and then you will hit the crash.
Speaker 1: 00:40:21
Oh, that’s really easy.
Speaker 0: 00:40:24
Yeah. Yeah. And I guess you could, yeah, like technically you need a new block to be able to do this. Yeah. I was going to ask. Yeah. But luckily, you know, there’s these nice people that gives you new blocks every 10 minutes. Right. So you can just use the blocks that other people are mining for you. You probably, if you want to down the whole network, you probably have to be quite quick about being the first to relay it to everyone.
Speaker 1: 00:40:51
And you have to, go ahead.
Speaker 2: 00:40:54
You would have to write a client that modifies this or just resend it to duplicate the sending of this.
Speaker 0: 00:41:00
Yeah. Yeah. You would have to have custom software that does the whole thing.
Speaker 1: 00:41:03
Right, and you would have to send these individually, right?
Speaker 0: 00:41:07
Yeah, yeah, this does not propagate. Like you have to go node by node.
Speaker 1: 00:41:10
Right, so this is more severe, I think, but it’s in contrast to the TX relay one we just talked about, where because everyone’s forwarding transactions as well, like the network kind of does it by itself. But this, you have to actively be connected to everybody and send these malicious messages really quickly and on very recent data.
Speaker 0: 00:41:33
Yeah, Yeah. I think the worst kind of, I don’t know if we’ve ever had a crash bug like that, but like the worst kind of crash bug would probably be you send a message to one peer once and somehow the entire thing propagates to everyone and then everyone crashes. But yeah, I don’t think we’ve ever had something like this and it’s probably quite unlikely that we would see this as well because all the networking code is synchronous.
Speaker 1: 00:41:55
Right. How?
Speaker 0: 00:41:57
Yeah. You would crash first and then not be able to relay anything. So. Right. Unless it’s like a timer or something. I don’t know. Like, yeah, it seems unlikely that we would see this with the current architecture, I think.
Speaker 1: 00:42:10
I’m trying to get a sense of how often collisions happen in the wild. According to BIP 152, it says it should, it should have at most one in F equals 281, 000 blocks. So basically almost never in practice do we see this happen. Is that accurate? I don’t know if B10C has a monitor for this.
Speaker 0: 00:42:39
Not yet. Not yet.
Speaker 1: 00:42:41
Yeah. So maybe start monitoring and in 10 years you’ll find one. Interesting. Okay.
Discovery of the vulnerability
Speaker 2: 00:42:49
Niklas, can you tell us a bit about the discovery of this?
Speaker 0: 00:42:52
How did you discover this? Yes.
Speaker 1: 00:42:54
Right. Credits. Nicholas discovered the bug and fixed it. There you go.
Speaker 0: 00:43:00
What was the purpose of it? Yeah, so at the time I was working on refactoring the block download logic. Like currently it’s all mangled up in like one big file and it was trying to extract the logic into its own class to encapsulate it and sort of my initial refactoring work was done and then I wrote some fast tests for that encapsulated module and I kept running into this assert And at first I thought, since I, you know, extracted the logic, maybe I’m just, you know, not using it in the same way as it is being used in that processing. So it might be that this assert cannot actually be hit. So what I have to do is go and manually review the net processing code to see if, you know, this is so it can actually be triggered, but it turned out that it can. So even though the test sort of operated at a different layer, it still turned out that this is an actual issue.
Speaker 1: 00:44:02
So typically, when you’re refactoring net processing stuff, cause I’ve done this too, you start, you like create the fuzzer and then like it crashes a lot. And I think, I mean, at least for me, I haven’t actually found an actual bug yet. It’s usually a problem in the fuzz. Yeah.
Speaker 0: 00:44:20
There’s this meme where it’s like a chart of like how many bugs you find and how many of them are in the, in the harness itself and how many are actual bugs. And usually you like, When you write or develop a new harness, like, you know, you’ll have a bunch of crashes and bugs in your test and then maybe one or two at the end in the actual code.
Speaker 1: 00:44:42
Is it like a fuzzing community where this meme is circulated? You said that there’s the name.
Speaker 0: 00:44:47
Yeah. Well, I know I just follow a few people that are into fuzzing on Twitter and I think I saw it at some point.
Speaker 1: 00:44:57
Cool. So did you feel slightly, like, I don’t know, to me, I didn’t know this bug and I didn’t read about it before we hopped on this podcast. So this is the first time I’m hearing about it and I’m almost a little bit, sorry, but it’s a little funny how easy it is to hit. And you said there’s a functional test that already tests it?
Speaker 0: 00:45:21
Well, we have functional tests for the compact block relay logic, obviously, and you can make a very simple modification to one of these tests to trigger the crash. Oh,
Speaker 1: 00:45:32
so it’s not part of the test suite.
Speaker 0: 00:45:35
You would have to modify the test and duplicate the line. I think that specific test is testing that collision case by like changing the transaction so that the Merkle rule doesn’t match and then to check that the full block is requested. So if you just duplicate the sending of the block TXN in that test case, you will find the graph.
Speaker 1: 00:45:57
Wow.
Speaker 0: 00:45:58
So in hindsight, that’s like the perfect reproducer. Very simple to demonstrate.
Speaker 1: 00:46:03
Right. But you don’t know that when you’re using. So what what was the fix? Because I’m imagining again, I also didn’t. You said it might be good for me to react to this raw, which is, you know, I’m not coming unprepared. It’s just like, we wanted this to organically happen on the podcast. But so then, like my naive solution or my naive thinking that I already kind of asked about was like, why isn’t this a protocol, like requirement that you never send a second BlockTX send message? Or what was the fix? So I think what you’re describing could be a fix. It wasn’t the actual fix that we went for.
Speaker 0: 00:46:47
I’ll have to go and look if that what you’re describing would make sense maybe now. I think at the time we probably didn’t want to go for that because it might make it too obvious. Sure. Of course. But now, yeah, like I think one of the lessons learned for me from this is that you should absolutely not write code like this where like a function is only supposed to be called once and then you have an assert that documents this like that’s, yeah, that’s not a great way of doing it. I think there’s more elegant ways of actually avoiding that something can be called twice.
Speaker 1: 00:47:18
So do you do an assume today and it’s like assume, if not assume this, then just exit out so that you do caption and debug?
Speaker 0: 00:47:28
So the actual fix, Okay, so yes, if you do want to have assert statements like this, then use an assume. Well, I mean, for Bitcoin Core specifically, the assumes are basically asserts, except they’re not enabled in production builds. So usually we, you know, we add an assume statement, but then we also add the logic for handling what if the assumption doesn’t hold. So in our testing, we would see if our assumptions don’t hold, but in production, this won’t cause your node to fall over.
Speaker 1: 00:47:54
Right.
Bitcoin Core #26898 PR to fix the crash vulnerability in compact block relay
Speaker 0: 00:47:56
Yeah, but the actual fix for this did not involve an assume, it just removed the assertion. And if you would hit the assertion, you just return false or you return some error code from the function and that, you know, that’s that. Yeah, I was just saying, sorry, go ahead. Yeah. And We, I mean, the fix was hidden in a PR that added a fast test for the partially downloaded block class.
Speaker 2: 00:48:23
All right. I was going to mention this. This is more like compared to the, to the earlier bug we talked about, this was fixed more in, in, in secretive, like, hidden from, from the few of the people.
Speaker 0: 00:48:35
Yeah, we were, yeah, since this is quite severe, we were definitely trying to make this, yeah, non-obvious. And I mean, the first approach was to go with my large refactor of the whole thing, but yeah, I think that, I think I’m glad that we went with the second approach because that’s a, yeah, a much easier, smaller change. But anyway, we add like the, the fix was to add the test and then the excuse for removing the assert in that PR was that the first test otherwise runs into the assert, but it can actually be hit. But also literally we know now that it can be hit.
Speaker 2: 00:49:11
Yeah. And it feels to me like people look not as close as you might think. Fastest harness changes.
Speaker 0: 00:49:25
Yeah, it seems harmless then. Yeah. No functional change. So who’s going to look at that?
Benefits of modularizing code
Speaker 1: 00:49:33
Right. It seems like these modules. So I remember actually, Nicholas, you kind of taught me this, and so I’ve started doing this, where kind of all of the dragons are in the net processing logic, and then you build these modules, or you can modularize these like, partially filled, partially downloaded blocks or like, you know, TX request tracker and whatnot. And those should be like very robust. So they can handle when the net processing logic does something really dumb and breaks these assumptions. And that also means you can easily fuzz the modules without very much state and just see like, yeah, I can handle anything that you throw at it. And then because the net processing logic is more difficult to fuzz and test, just requires much more setup, then like this is a good way to kind of reason about, okay, this, this side we know less about, but this side is really, I’m, I’m using my hands as if you can see me, but like the net processing stuff, we know less about, but the modules we know are robust, is that an accurate way to say what you’re searching.
Speaker 0: 00:50:47
I think it’s a good, yes, I think so. And I think it’s a good step of like, you know, you bought the risers and then you test the module, but we ideally also want to go one layer out and like test through the P2P interface because if you have, you know, something like this with the asserts, if you write tests, you’re obviously going to try to avoid the asserts because otherwise the test is failing, but then maybe in production, you can actually hit the assert. So you definitely want to go one layer out to have the test and see if you can hit the assert and like a more close to production kind of environment. But I’ve would fully agree that, you know, what we’re doing with modularizing and then testing the modules is yeah, I think close it’s definitely, you know, It’s a great step, but I think there’s still like room for improvement going further out as well.
Speaker 1: 00:51:36
Yeah. I also think your trick of adding a fuzzer in a refactor might not work in the future because my TX download modularization plus fuzz, I’ve gotten, I’ve already gotten a DM from a reviewer that was like, hey, is this a hidden bug fix? Like they like pointed to this line of code that was changed And I think it was just changed because it makes sense, but it’s not a vulnerability. They’re like, Hey, is it okay if I comment? I just wanted to ask you just in case.
Speaker 0: 00:52:11
Yeah. Yeah. I guess that’s the downside to like pointing out how we fix stuff because then it gets kind of more obvious. Yeah, but… You know, there’s no shortage of creative ideas of how to fix stuff.
Speaker 1: 00:52:25
Yeah, and I think there’s a lot of like, it’s interesting, we have different levels of like secretiveness in the three that we’re talking about today. Sorry, I’m already doing the, we do summaries of the things that we talked about. Do you want to add anything to the third topic?
Speaker 2: 00:52:43
I would have a question, Nicholas.
Speaker 1: 00:52:44
Okay.
Speaker 2: 00:52:45
Do you think this will be exploited on the network given this is being announced and people are still running old versions, given how easy it is to exploit?
Speaker 0: 00:53:00
I don’t know. I mean, maybe, but maybe, but I think it, I mean, it’s possible, but I think if we release this, you know, the people that haven’t upgraded so far should upgrade and any attacker knows this as well. So if you’re like planning something serious, you kind of have a gun to your head now because the fix is available and the knowledge is out. Yeah. So like the cost of developing the exploit might be too high.
Speaker 2: 00:53:24
It might be more someone doing for fun.
Speaker 0: 00:53:27
Yeah. So I could imagine some scripting trying to do it. Right. That’s probably it. Yeah. Right.
Speaker 1: 00:53:34
And we have a functional test, I guess.
Speaker 0: 00:53:38
Yeah. I mean, there’s no, you know, no exploit code or anything is public. So You can try recreating the attack from this, but there, I think to pulling something off in a good way, you need to put in quite a bit of work. So good luck.
Speaker 1: 00:53:54
Do you think it’s worth adding the protocol change now? The protocol change? Like if somebody sends you twice, then you disconnect. Yeah.
Speaker 0: 00:54:04
I think we might want to change it to disallow this. But I’m not sure. Yeah. I don’t know. We’ll have to see. It could make sense. Yes. Like, you know, now that it’s public, let’s do the actual fix. That is probably more robust.
Speaker 1: 00:54:18
I also have a question about the prefilled transactions. I guess it’s more about complex blocks, but what do we need a protocol change to start adding more things to prefilled or can we just do that in implementation logic?
Speaker 0: 00:54:31
I think we can just, so the logic for handling the pre-filled transactions exist. The logic for putting more stuff into that set does not exist. So that means that if we come up with some smart way to pre-fill more transactions tomorrow, all nodes would still be able to handle those new transactions that are now added. So I guess, no, we won’t need an actual protocol change, just the change to like actually add stuff to the pre-filled set.
Speaker 1: 00:55:00
Right. Because, you know, as we mentioned in the TX int queue thing, we have a, a somewhat of an idea of what transactions our peers already have. And we use that logic to not announce transactions to them, but we could also use it to decide which block TXNs to pre-fill.
Speaker 0: 00:55:19
Yeah. If you know something is likely not to be present in the other nodes mempool, you should probably pre-fill it.
Speaker 2: 00:55:28
You could also pre-fill transactions you didn’t know about and needed to request.
Speaker 0: 00:55:33
Yeah. So like, maybe, you know, maybe you could, I don’t know, non-standard transactions from the block you just validated. All right. Yeah. Maybe that makes sense.
Speaker 1: 00:55:42
Right. I think just the ones that you don’t have is a good idea because then you don’t need to like process as much. You just forward the same stuff.
Speaker 2: 00:55:51
And I think the original goal was to implement this, but at least in my head, there is a big to do of doing this.
Speaker 1: 00:55:58
Yeah. I saw it in the code too. There’s a to do, like why hasn’t anyone done this?
Speaker 0: 00:56:03
Yeah. I think, I think the answer is just like nobody has. Yeah, it is certainly possible, but no one has taken the time to do it.
Speaker 1: 00:56:11
I guess it’s probably a data question.
Speaker 0: 00:56:13
Yeah. Yeah. Like you would have to come up with some solid data of like what would be good to pre-fill.
Speaker 1: 00:56:19
Right. Well, good thing we have the data person on this call.
Lessons learned
Speaker 1: 00:56:25
Okay. Should we start thinking about overall thoughts? I can start.
Speaker 0: 00:56:31
All right.
Speaker 1: 00:56:33
So I I wanted to talk about kind of it’s interesting how different the secretive levels of all three of these are, because the precinct one is like, yeah, everybody knows this. We’ve known this since 2014 and it was kind of solved in the open, though it was tried, they tried to put this close to the release so that, you know, we’d, there’d be a fix immediately after bringing it back into the public consciousness. And then yeah, the InvenQ like pretty much out in the open, but again, pretty urgently cut a release afterward. And then this one’s, like, super secretive. But I think going back to what Nicholas said about, like, we should really urgently, like, tell people to upgrade when they are released. I mean, this is why at a meta level we’re doing all of this, right? Because I think when you say, hey, we should release or sorry, you should upgrade to the new release. You also should, that should come along with, by the way, there’s a bug and we plan to tell you what the details of it are. I guess in this case, in a year and a half. But that’s better than nothing, right? Like if you’re asking people to do things, but you are telling them why later on, because, you know, it’s not just, Hey, trust me, you know, we’re the maintainers and we know better. It’s, you know, we, as the maintainers, we asked you to do something and we had a really good reason to do that. I don’t know. Do you agree?
Speaker 0: 00:58:03
Yes. I think, I think so. I, I think we might, like we currently wouldn’t do it, but I think it might make sense to like maybe also, you know, pre-announce the severity or what kind of bug it is, because depending on what it is, you might have different trade-offs for upgrading. Like these kinds of bugs where you can connect to someone and crash them. They are more severe for, you know, folks that run a lightning node, for example, or like a lightning node with a Bitcoin core node that publicly listens for incoming connections. Like if you’re, I don’t know why you would be doing that, but, yeah, if you’re, you know, if, if uptime is a big requirement for you.
Speaker 2: 00:58:43
Oh yeah. A mining pool versus inbound connections. Probably. Yeah. So it’s also, also the same. Yeah.
Speaker 0: 00:58:49
Yeah. Like, I mean, you know, if you don’t run a public facing node, it’s probably less bad for you. It’s still not great because someone, you know, can wait until you connect to them and then crash you, but yeah, I think I’m just saying. just trying to say like, different vulnerabilities may cause, you know, a different set of people to be more urgent about upgrading. So we could, we could try to communicate that. We currently don’t.
Speaker 2: 00:59:11
But yeah. That seems a little bit…
Speaker 0: 00:59:13
It might also give away too much.
Speaker 2: 00:59:15
Yeah. And also if we create a release and we don’t mention that there are vulnerabilities, will people just not upgrade?
Speaker 0: 00:59:23
Oh yeah, that’s kind of what we’ve had for the past couple of years.
Speaker 2: 00:59:27
Right, exactly. And people didn’t upgrade. But now it would be even more obvious, right? Cause they’re like, Oh, well, you didn’t say anything this time. So obviously, 26 is the last good version or something. Yeah. Yeah, that’s true. So maybe not too much, but I really like that. There’s so much transparency with kind of the way that these advisories are being sent out.
Speaker 0: 00:59:54
Yeah.
Speaker 2: 00:59:55
Okay. My next like observation or like adjectives that defers across these three vulnerabilities is like how simple it is. Both in like what the attack is and what the fix is. It feels like the headers precinct thing is just a really hard problem. Like, you want to hear about new stuff, but like, it’s from unknown peers, and it might be garbage. Like that seems like a really difficult problem and it was thought about for many years. And the fix is also fairly complicated. Like I remember the review being quite involved. But the other two seem to be slightly lower hanging fruit. Like the fact that you can just send two messages or the fact that like we had this simple standard library set and we had this sort function and we kind of just assumed that it would never be very big. And so the processes around it, at least I felt like it was pretty low hanging fruit. I don’t know how you feel.
Speaker 1: 01:01:07
Yeah, I agree. That’s like an unbundled data structure.
Speaker 2: 01:01:11
Yeah.
Speaker 1: 01:01:12
From some external thing happening on the network. Obviously it costs a bit of money in fees to fit it, but yeah.
Speaker 2: 01:01:20
Yeah. And.
Speaker 1: 01:01:23
And we have it for every peer. So up to normally like 125 peers.
Speaker 2: 01:01:30
Right. Yeah. I remember we, like when we were reviewing the PR, we ended up with a list of like 10 things we could do to make this a trivially much better. Just because it like, yeah, like, Like I said, you’re just sorting it and then you’re assuming, okay, take the first 35 and then the rest you’ll sort again later. And usually there are no more than 35, but, sometimes there’s tens of thousands of elements after the first 35.
Speaker 0: 01:02:02
I remember when I found the block TXN crash bug, I was pretty shocked. Like, all right, if I can find something like this, then surely, you know, people with more experience have probably an easier time. It was kind of the thing that got me really deep into the old fuzzing rabbit hole. And like ever since then, all I’ve been doing is, you know, security related work pretty much.
Speaker 2: 01:02:27
You just keep finding bugs.
Speaker 0: 01:02:32
But yeah, I will say it’s been two years now and like this kind of bug, although that did seem like low hanging fruit, I think my conclusion is like the bugs that are probably in Bitcoin Core are like in the untested bits, like now looking back, the partially downloaded block stuff and the net processing code relating to compact block download was not being fuzzed at all. So, you know, the chances are quite high that you will find a bug there. But overall, Bitcoin cost testing is quite extensive. So if we have low hanging fruit, it’s probably where the, you know, testing coverage is at.
Speaker 2: 01:03:08
Right. And, and net processing is the place that’s really low coverage. I think, yeah, it’s, it would be unfair to say that Bitcoin Core has bad coverage overall. It’s just this very particular area of the code that’s so difficult to test and is also so critical to security. Because I guess we can explain like net processing is like in order to write unit tests for it, for example, you need to have like an entire chain state. Like if you want to send it blocks, you have to like produce blocks to send to it because it’s like, I don’t know how to like granularity wise or like in terms of layers in the interface, it’s like in this like middle zone where it handles a lot of application logic and is also really, really big and complicated. Yeah, it’s still not like you have a better explanation.
Speaker 0: 01:04:06
I mean, I think in the technical sense of like writing a unit test, it’s almost not even possible because it, like, it isn’t just one unit, there’s like so much going on and it’s all very entangled at the moment. I mean, it’s, it’s been improved over the years for sure. Like the trend of, you know, encapsulating things and separating them out has been ongoing, but it’s still like, you need various things, you need to go to disk, get a bunch of IO and like writing an actual unit test for anything is almost impossible I’d say. So yeah, I, I think, you know, even if we don’t have any bugs right now, if we keep developing the software, we should refactor this stuff to be more modular so that we can have more fine grain tests, et cetera, et cetera, right? Like this is pretty basic stuff, I think, but obviously it takes manpower and hours to review. So yeah, it’s not easy to get this over the line.
Speaker 2: 01:05:01
And we should be spending all our time on this or like 60% of our energy, I think. Anyway, cool. Anything, any other thoughts?
Speaker 0: 01:05:12
No, that’s all I think.
Speaker 1: 01:05:13
Yeah, for me as well.
Speaker 2: 01:05:14
Great, Well, thank you so much for doing this. We’ll see you next time.
Speaker 0: 01:05:19
All right. All right. Thank you. Bye.