Bitcoin network partitioning & network-level privacy attacks
Speakers: Ethan Heilman
Date: June 12, 2019
Transcript By: Adam Jonas
Tags: Privacy problems, P2p, Eclipse attacks
Media: https://www.youtube.com/watch?v=StnOVBbIpD8
Location: Chaincode Labs 2019 Residency
Slides: https://residency.chaincode.com/presentations/bitcoin/ethan_heilman_p2p.pdf
Eclipse attack paper: https://eprint.iacr.org/2015/263.pdf
Bitcoin’s P2P Network
Please interrupt me. I’m going to go through this stuff a little bit quickly because I hear that you’ve already read Eclipse attack paper and you’ve already looked at some of the peer to peer stuff.
Bitcoin uses a peer-to-peer network to announce transactions and blocks. It makes 8 outgoing connections. These are TCP connections initiated by the node. It makes up to 116 connections that are incoming, so they’re initiated by someone else. And it makes one feeler connection under some circumstances, and I’ll talk more about this feeler connection later. You can actually treat the feeler connection as an outgoing connection. It is an outgoing connection, but it has some different rules, so I just broke it out separately, but if you want to say that there are nine outgoing connections counting the one failure connection, you can say that as well. I want to point out that when we talk about outgoing connections, we’re always talking from the perspective of a node. If you look at this purple node, it has an outgoing connection to this yellowish node. But from the yellowish node’s perspective, this is an incoming connection. Whenever we talk about outgoing and incoming, it’s really which node is an outgoing one. Everyone’s outgoing connection is another’s nodes incoming connection.
So nodes store the IP address of other nodes in two tables. The new table stores the IP addresses the node has heard of – this might be another node that they could connect to, but they’ve never actually made an outgoing connection to that node, so it could just be total garbage. In fact, it’s pretty easy to fill the new table up with garbage. It could give you nothing. It could give you just the stuff it wants you to know about. It decides what access you get. This is referred to in the P2P networking world as an eclipse attack. This blue node is eclipsed.
Example 1: 51 percent attack with 40 percent mining power
We’re going talk about later how you can eclipse a node but for this we’re just gonna look at what bad things can happen when you’re in that position. A malicious party controls access to information to particular nodes. One thing that can happen is that if you were to partition a miner from the rest of the network – so you’ve stuck 30% of the mining power in a partition, and you’re the attacker here, and you also have 40% of the mining power – because these two miners don’t learn about each other’s blocks they will build independent forks. And because this is 30% and this is 30%, if the attacker has 40% of the mining power, they can build a longer chain. So by partitioning the Bitcoin network, you can actually engage in what would be a 51% attack without a majority hash rate.
Example 2: N-Confirmation Double Spending
Another thing that you can do is you can partition say a merchant and some mining power – the miner won’t learn about the blocks that are being created over here so the miner will create its own fork and then you can double-spend so you can send a particular coin to the merchant you can send that same coin to yourself, the merchant doesn’t see the merchant only sees this blockchain and only sees this transaction they don’t even realize there’s a double spend and then when you stop partitioning the merchant the Bitcoin miner their blockchain will go away and this blockchain they’ll realize that this blockchain is the one with the most weight and you’ve double-spent. And notice in this case, you don’t actually require any mining power yourself because you sort of use this miner to perform this attack by putting it in a partition, but this miner may just be an innocent victim that you’ve managed to partition.
Eclipse attacks: Other ramifications
And some other bad things that can happen a lot of layer 2 protocols, including lightning, have this notion that when you see a particular transaction, you have an adjudication period to post a breach remedy transaction, and this security relies on the censorship resistance of Bitcoin so it’s assumed you see someone has posted an old transaction you have the breach remedy transaction to post it, but if you’re if you are eclipsed the eclipsing attacker could just censor that transaction. So if you are a lightning node and your Eclipse for the Bitcoin network, you can break the security of lightning. Lightning fundamentally assumes that Bitcoin is censorship-resistant, and you can announce your transactions. Right, so you could imagine an attack you’re talking about where I know your IP address, and I also have a lightning channel with you, and every time you announce, I post an old lightning state, you announce the breach remedy transaction, and I just drop it on the ground.
However, you can have situations where maybe the party that eclipses you is not necessarily malicious, and we’ve actually seen this happen. A chainalysis [company] back in 2015 eclipsed a bunch of people and was dropping their transactions on the ground, but announcing transactions to them. And so it created a situation in which those people weren’t able to announce their transactions. So it might be that the eclipsing party is not even in a conspiracy with the party that’s cheating in the Lightning Network, but you really want to make sure you’re not eclipsed if you’re using a lightning or doing layer-2 stuff.
Q - Inaudible…
A - So I think I mean I think this is a good question because normally in Bitcoin, it sucks if your transaction is censored, but it doesn’t necessarily mean that you get cheated if your transaction is censored within a time frame. But with layer 2 protocols, we’re now making that assumption, and so I think it would be good for people to think about it. Maybe the ability to you know a warning, and you export that transaction from your wallet and posted to a block explorer or some additional means. But I think it’s a good thing to think about what happens if someone were to actually try to apply this attack at scale to the Lightning Network. Maybe they go and just take a botnet and start attacking all the lightning nodes and knocking them offline, and they hope they can knock enough offline that they can steal from someone’s channel. But I think that’s an interesting project to look at.
Privacy - If you are the only connection that a node has to the Bitcoin peer-to-peer network, and you announce a transaction, they know that transaction came from you. So if they eclipse you, they can determine which transactions originated from your node.
And then a fork, you can imagine situations where say it’s less common now, but you have a two-block fork in the Bitcoin network right, and someone could just show you one side of that fork so they could double-spend a lot. And then if there’s ever a two-block fork, they show you the losing side of that to block fork with the double-spend, and you think “oh well I’ve got one confirmation on top of that or two confirmations on top” that’s good but what you don’t see is there are actually 6 blocks on the other fork. Does that make sense to everyone?
On-path, Off-path, In-path Attacks
So how can this happen? On-path or in-path attacks. I’m going to define on-path and in-path attack on the next slide. Off-path attacks that manipulate the peer-to-peer network. So everyone read the Eclipse paper – the Eclipse paper is generally talking about off-path attacks that manipulate the peer-to-peer network. DNS attacks that poison the tables are when a Bitcoin node to learn about the other nodes in the system use DNS when it’s first booted up when it has nothing in its peer tables, you can attack it that way. And then we’ll look at some other bad things that can happen.
So this on-path, off-path, in-path distinction is a network security distinction. Off-path, you’re basically the attacker here. You can send messages to the victim, you can receive messages to the victim, but the victim can send messages to other people that you can’t see. So you can’t see these, and you can’t interfere with their traffic. So normally the security when you just connect to a website or an off-path attacker – you can’t see the other transactions, and you can’t pretend to be other people in the network. And an on-path, the attacker can send messages to the victim and receive messages from the victim, but they can’t drop packets. So they can see the packets that get sent, and they can inject their own packets, but they can’t actually drop packets. And then finally, an in-path attacker, and this is usually what people think about with man the middle attacks, is the attacker is sitting directly between the victim and the rest of the network. So if the attacker sees a packet it doesn’t like, it doesn’t relay, it just drops it on the ground. And one of the important distinctions here between in-path and on-path is that in-path is generally very expensive for attackers, they get a lot of traffic. They have to buffer this traffic and send it on so they will introduce a lot of latency whereas on-path is much cheaper. You can just sit there and tap a network line and see the packets coming through and then inject traffic. So a lot of censorship frameworks will use on-path for large amounts of data because it’s just cheaper to do, and then when they really want to do a targeted attack, they’ll move to in-path. So a lot of the Great Firewall of China is on-path, not in-path. And if you look at the quantum insert, some of the NSA stuff, they were on-path attacks. They see the traffic, and they send new packets.
On-path attacks
So an on-path attacker can also eclipse you. In Bitcoin, an on-path attack actually functions a lot an in-path attack. Because Bitcoin connections are TCP connections, you can send a packet that says I want to reset a particular TCP connection, and you can drop it. You can inject traffic that will cause the host to break that TCP connection, and then they won’t receive the packet – so you have this ability to drop things. The host still receives the packet. They just discard it because they think the TCP connections has been broken.
But one of the things that you could do is you could put some sort of encryption here. It’s a little bit tricky, but if you assume that you’ve set up keys, the attacker… I guess the question is, and here’s a question for the room. If you put encryption here, does it give you any benefit? So I don’t think encryption would help you against reset attacks. There are other things that you could do. For instance, you could forge packets. Even though you’re not in-path, you still see the TCP header information, send cookies, and whatnot, so you could generate packets from this host without actually being this host. And so, you could say rather than just send a reset connection and break the connection, you could send a message that would get that IP address blacklisted. So personally, I think encryption doesn’t help that much here, but it can limit some of the attacks. But probably what an attacker would want to do is just reset this connection and have you connect to someone else. But if there was a reason, and I think this is kind of an interesting question that needs more thought, is how much does it buy you here? Is blacklisting an IP verse breaking a connection significantly worse for having encryption here? So you establish this encrypted connection, and then the attacker shows up. But then the question is if the attacker then disrupts this encrypted channel and then is like, “oh no, I switched keys!” how do you tell between those states? And so, it seems for a targeted attack that’s not so good, but if someone were doing this at scale, then they’d have to write software to do the crypto and to disrupt these things and then to make sure that your packet didn’t get in first re-establish the key.
I just leave this as an open problem for everyone to think about is if you did have trust on the first use encryption here or you had something like VPN connecting these two points, how much does it buy you in terms of security? And if your attacker is a Great Firewall of China or the NSA, doing this very not a targeted bulk attack, versus say an attacker that’s attacking your specific node and writing custom software to do it. I just want to throw this in there because it’s kind of an interesting question.
So I believe there are certain packets, bad transactions, and garbage behavior that if you send to a node that node’s IP will be added to a blacklist. I think there’s a point system. I know they’ve changed this recently, or they’ve changed this a few times, but there used to be things that you could send, and you’d have blacklist points. When you hit 100, you’d be blacklisted. I’m not sure when they would back off that blacklist thing, but you could reset the connection and then stick your own key in there. You still wouldn’t be in-path, so these packets would be going out here, but then you’d be racing those packets and trying to keep that connection there. So maybe you send a reset this way, and you send a reset this way. This node doesn’t think it’s talking to the purple node anymore, so it’s just dropping all these packets on the floor, and then you’re talking to the purple node.
When we were writing the Eclipse attack paper, you would be more likely to connect to a peer that you had connected with recently than in the past. And so what this would do is that if you have a long life connection to someone, they would be more likely to connect back to you, and you’d be more likely to connect back to them. And so, it created this cluster of very tightly connected long life peers who were all connected to each other. And then new peers couldn’t even get in because all the connections, all the incoming connections, were full. And there was another paper that looked at the peer-to-peer network and observed it wasn’t actually a random graph. That had this weird property. And so, I personally believe that it was that most recent behavior that accounted for that weird property. Since that’s been removed, no one’s done a study to see whether it’s actually a random graph or not that I’m aware of. I guess a way to answer your question would be to look at if long-life nodes have all of their incoming connection slots filled because if they start to have behavior that, but if they don’t, then a new peer would just potentially connect to one of them. But I think that’s a really interesting question, and studying the Bitcoin network, there’s some surprising emergent behavior.
Off-path attacks
So now we’re going to talk about off-path attacks. This is much more what the Eclipse attack paper did. You fill the node’s peer table with attacker IPs. The node restarts and loses its current outgoing connections. The node makes new connections to only the attacker IPs. So you see, this node has been eclipsed by an attacker.
I’ll just go over this quickly because I’m sure you’re all pretty familiar with it. How easy are restarts? Restarts happen when there’s a new version of a Bitcoin. There’s some security patch – you either reboot, or you don’t. In either case, the attacker is happy. If you don’t reboot, they exploit you. I think this should just generally be a principle: the security of the peer to peer network should not rely on a percent node uptime. We should accept that we have an attacker that can strategically restart your node.
I think that a list of banned IP addresses is probably of some use. A targeted attacker should be able to get new IP addresses. But for example, when chainalysis was accidentally eclipsing, they eclipsed a bunch of wallets that were connected to the peer-to-peer network that were not actually bitcoind nodes because the wallets only made three outbound connections and they didn’t have this slash 16 rule. So chainalysis, to not do an eclipse attack, had all of the IPs that they were performing there this attack from be in the same slash 16 so they thought they’d only get one outgoing connection, but because these wallets didn’t implement that rule, they just made all three outbound connections to chainanalysis’ servers. So when I first started seeing chainalysis send me all these packets and basically do the Eclipse attack while I was also doing the Eclipse attack. I was doing it, and I was like “oh good,” and then the results in the paper are actually worse than they should be because there were two people attacking. There was me attacking the node and them attacking at the same time. So they were also trying to fill my peer tables, but their slash 16 I looked up their IP address because I had no idea who this person was, and I found on port 8080 on their IP address that it was chainalysis law enforcement portal login. And then I typed into Google, and I saw that there were all of these wallets that had banlists for it, so I contacted one of the maintainers, and I asked, “what do you know about this IP address?” And they’re, “oh yeah, our wallets stopped working for a month, and it was, and all of our wallets were connected to this slash 16, so we just added the banlist, and we haven’t had a problem since.” So I do think banlists can work, especially when you have accidental attackers. They were trying, I would presume, but I don’t know, they were trying to engage in network surveillance, and they just accidentally eclipsed some people, but once they got added to the banned list, they continued their behavior for a while. And then it got written up somewhere, and then they apologized. But the banlist did actually make it, so they weren’t able to connect all of these things. So banlist sort of do work in this interesting area where it’s you know you’d think they’d switch IP addresses, but they didn’t.
So sometimes things can be incredibly effective and sometimes things that can be really ineffective. Just an anecdote, a similar thing – I had a friend who is had a CTF, and they were trying to get an exploit to run on a server as part of this competition. And they spent a good chunk of their time. They’re like, “we know it works. We’ve tested on our own systems, and it won’t work on the other system.” They couldn’t figure it out. It turned out the reason they couldn’t get it to work was that they were using a hostname to do the callback, and on that network, they just turned off all DNS. So totally stymied them. They didn’t even think of that. So sometimes, from an attacker’s point of view, an attacker can instantly be like, “oh, I’ll get a new IP address” or not realize what’s going on. And just be like, “oh, I guess it doesn’t work anymore.”
It’s surprising how things can break. I feel with all of this networking stuff, no matter how robust the software is, you always have to be careful. When we were doing the Eclipse attack paper, my adviser, who does a lot of network security work, was like we have to be incredibly careful. All the IP addresses you use to do stuffing have to be non-allocated ipv4 addresses, and we have to set up a firewall, so those packets don’t leave. And we modified the victim nodes that we’re attacking so they wouldn’t send any address packet out for any of the IP addresses in that unallocated range that we’re using. And, at the time, I was like, “oh, maybe it’s a little bit overkill. It’s good to be cautious, but you know.” I’m making the code changes, and then we saw that there was a paper, I think it was Bitcoin over TOR is not a good idea, and in the appendix, they had an attack. And the attack was if you just broadcast lots of addresses, it’ll fill up this buffer in a Bitcoin, and then that buffer will get so big that it will crash. But it will also send out lots of IP addresses, and when it crashes, it’ll forget about that buffer, and then it will get them again, so you have this rolling brown-out effect on the peer-to-peer network. And it was never exploited, and it’s in the appendix of this paper. And when we were doing this stuff, it was a month after had been patched, but we didn’t even know it existed at the time. So it was actually really good that we’re cautious because if we’ve done it earlier and had not been cautious, we could have actually caused some serious problems. And there’s, I think in #bitcoin-dev chat, if you look up that paper, there’s a thing where someone notices this in the appendix and asks, “have we fixed this?” And I was like, “oh yeah, we independently discovered this a month ago and fixed it.” But I think with all of this stuff you always have to be really careful because when you’re doing unexpected things to any sort of network, you can cause really big disasters accidentally.
Alright, so this is a pure table manipulation of the peer-to-peer network. Because outgoing connections need to be in different slash 16s, the attacker needs at least 8 IP addresses in different slash 16s. The new and tried table limits the buckets that a slash 16 can be in. So you have a slash 16 this, and it can only be in let’s say these 8 buckets. And this is actually a good defense because there are more slash 16s than there is room in these buckets, so if you just own an entire slash 16 and try to attack the Bitcoin network, you’ll limit yourself to these buckets. And then also having the outgoing connections in different slash 16 so you can’t just buy a slash 16 and then do a massive attack. You have to have a diversity of IP addresses, which for some attackers, is harder.
Questions: inaudible… It used to be more of a thing when you get allocations from IANA or whatever, they’ll usually try to give you a range rather than a bunch of random IP addresses to make routing tables nicer so that they can compress it and be like, “oh this AS owns this range rather than this AS owns this a thousand random IP addresses,” right? So it is easier if you’re buying in bulk to get them all in the same range. But you can buy random IP addresses from different cloud hosting sites and get in different slash 16s. And botnets are sort of often very diverse in their IP range. Although, since they usually target consumers, it’s the IP addresses of ISPs that serve consumers. But when we did analysis, and I don’t talk about this here, but we did an analysis of IP diversity, and most botnets have sufficient IP diversity that this doesn’t present much of a problem to them. But if a company just decided to attack Bitcoin using their existing IP addresses, this would present more of a problem.
So one of the assumptions these defenses make is that the nodes have been running for a while. It has a bunch of honest IP addresses in its tables, and then the attacker shows up. So if this is a timeline, it’s good, and then evil shows up. Not evil shows up first and then good. Because if evil shows up first, they just win, right? The reason we make these assumptions is not because it’s a good assumption to make – this could happen just as easily as this. But if this happens, you’ve already lost. It’s much much harder to defend against. So we assume this because at least with this, you have a fighting chance. So a core assumption is that the IP addresses that you currently have are good, and the IP addresses that you’re going to get in the future are bad.
… So by honest, I mean honestly follows the protocol. But there could be older nodes. It could be we’re looking at off-path attacks, but if you think about on-path attacks, those honest nodes could now be hijacked by someone else that’s injecting packets in there. I’m using the term honest to mean follows the current protocol, not to mean good-intentioned but bad. One of the defenses which is currently deployed is a test-before-evict. If you see an IP address in the tried table and a new IP address is going to evict it, test the IP address in the tried table first, and if it’s still up, don’t evict it. And this sort of has this notion that it’s more likely to be good than evil because it’s already in the tried table. This is deployed in Bitcoin now and what this means is that if your tried table was entirely filled with honest IP addresses that we’re online, an attacker wouldn’t be able to get anything into the tried table. But that’s probably never gonna be the case both because nodes go offline and also because it would be unlikely for the tried table to be completely filled up with IP addresses. Does that make sense?
So another assumption is that the new table is easy to fill up with trash IP addresses. This is because these IP addresses are not announced. These IP addresses are not tested. I can just have my Bitcoin node, and I can fill an address packet up with a bunch of crazy IP addresses I just made out – that have never been Bitcoin nodes, that will never be Bitcoin nodes, that may be unallocated IP address space. And I can announce them to you and get in your new table. So this argues that the new table is not really a defendable position. Anyone can come along and just fill it with crap.
Question: inaudible… It does, but if I’m up here, I could just connect and then just blast a bunch of nonsense in there. And it’s pretty hard to prevent. Or I don’t know of any good way of preventing that, so generally, this assumption is that we sort of trust the tried table and we don’t really the new table. The tried table is really what the attacker has to overcome. They can overcome the new table. The tried table is the defendable position. And so what we want to do is we want to increase the number of IP addresses in the tried table to make the tried table harder to take over. Because if you have nothing in the tried table, it’s super easy to take over.
So there is a countermeasure in Bitcoin called feeler connections. What feeler connections does is after you’ve made 8 outgoing connections, every two minutes there’ll be this 9th connection, which is a feeler connection, which will either pull an IP address from new and check to see if it’s online – and it is online, it’ll add it to the tried table. Or if something is going to be evicted from the tried table, the to be evicted IP address will be added to a buffer, and the feeler connection will pull from that buffer to see if that IP address should be evicted. Basically, we have that situation where stuff in the new table gets tested by the feeler connection and ends up in the tried table.
If we just look at how many attacker IP, and these are randomly sampled IP, so they’re going to be randomly sampled in different slash 8s, how many of the attacker has and they attacker success probability, we can see this is the red line is default Bitcoin and the blue line is feeler connections, so it’s harder for an attacker. An attacker needs more IP addresses. And when we add feeler connections before tests-before-evict, we get out slightly to here – so you need around 9000. This probability analysis was done a while ago back before these countermeasures were implemented. So this is kind of assumes a spherical Bitcoin. Now that they’ve been in Bitcoin for a while, be really neat if someone could test this and see what resources an actual attacker needs against the Bitcoin network.
I believe what would happen is that it tries to add say IP 1 from to tried, but there’s something already in tried, there’s IP 2 is in tried. So IP 2 gets added to the test buffer, I forget what it’s called, and the test buffer consists of IP 2 and IP 1, and then the next time a feeler connection comes along, it will pull from the test buffer, and if IP 2 is offline, it will insert IP 1 and evict IP 2. Otherwise, it will drop IP 1. So this test buffer, I think holds a maximum of 10 entries, and the feeler connection, if the test buffer is not empty, will pull from the test buffer, and then we’ll go back to new. Currently, there’s one feeler connection. You could add more feeler connections if you want to move stuff from new and tried.
An open question that I don’t have to answer to you, but my intuition is that there’s just not enough nodes that accept incoming connections to completely fill tried. Especially because when something is sent in to tried its map deterministically to a particular position in tried. So two IP addresses can just map to the same position. It’s randomized for each node. Even if there were exactly 10,000 nodes in Bitcoin that accepted incoming connections and the tried table was 10,000 IPs big, there would still be some empty room.
Question: It’s deterministic for the node but… inaudible… Right, exactly. And this was in the eclipse attack paper because before it was probabilistic and deterministic actually made the attack harder.
Incoming connections
So to eclipse a node, you need to control both incoming and outgoing connections. In the Eclipse attack paper, it was really easy to control incoming connections. You could just make 117 connections from the same IP address, and you totally monopolize incoming connections. I don’t know if anyone tried this attack against the network. I certainly didn’t. But one thing that you could have done in that past Bitcoin is just take every single node that accepts incoming connections and just fill up all their incoming connections so when a new node joined the network, they wouldn’t be able to make connections to anyone. To defend against that DoS connection exhaustion attack, Bitcoin under certain circumstances, now allows new incoming connections to evict old incoming connections. There’s a set of rules for doing this. One thing that’s interesting about this new rule is that when you evict an incoming connection, you’re actually breaking someone else’s outgoing connection. So there’s a question of whether this is evil to making the Eclipse attack stronger where you can break outgoing connections by getting that outgoing connection to be evicted where it’s an incoming connection. I don’t know if that makes sense to everyone because I’m using incoming and outgoing – it’s a little bit confusing. But node A’s outgoing connection is node B’s incoming connection. And maybe if you make a bunch more incoming connections to node B, it’ll break this connection because it’ll replace this incoming connection with that incoming connection. I don’t know if anyone’s looked at how whether that helps or not. I don’t know whether it works or not, but I think it’s an interesting area to study. And then the other question is, let’s say you want to do this connection exhaustion attack, at some point, you’re just evicting yourself and locking people out. To perform this connection exhaustion attack, how many IP addresses do we need? I certainly don’t have any hard numbers for that. And if you were to perform an eclipse attack, how easy would it be to eclipse the incoming connections given this eviction logic?
Incoming eviction logic
So I looked at the source code none of this has been tested, so this is just from me reading the source code a couple of days ago, but this is the rules for evicting an incoming connection. So you create a list of all the incoming connections. You remove the IP addresses with a particular slash 16 – so there’s just some random slash 16 that’s the protected slash 16. And you remove up to 4 IP addresses with that. You remove type 8 addresses with the lowest ping time. And each time you’re removing these addresses, this is the list of connections to evict. You’re now protecting those from being evicted. You remove 4 IP addresses that most recently sent us transactions. 4 IP addresses that most recently sent us blocks. And remove the oldest connections – 50% of the list. And then there’s this prefer evict setting. If any remainders on the list have prefer evict, then you evict that IP. If none of them have prefer evict, you select the IP addresses that have the most connections from a particular netgroup, a particular slash 16, and you evict the youngest connection from that netgroup. So it’d be really interesting to model this and see how well this works. What happens when you make lots of connections to it? How gameable is it? Maybe it’s really robust to attacks. Maybe there are some clever games you can play with this. I don’t know. This is in net.cpp line 857. There’s the set of rules.
I also just wonder how this is justified. A lot of the ideas that have been the people have come up within this room have been about making sure the ones that send us blocks that we keep those connections. So it seems that logic is in here. If you imagine someone got booted and they connect to another node, and you’re doing a connection exhaustion attack on the whole network, and they get booted from that node. But only if they’re in this netgroup. It’s interesting I’d love to see quantitative models of how this works and how attackable it is, especially because it allows you to break outgoing connections. If I were writing an Eclipse attack paper, I would totally be focused on figuring out if I can use this as a tool. This would be my starting place, but I am not writing a new Eclipse attack paper.
Anomaly detection
Can we use detection as a defense? Well, even if something is detectable, it doesn’t necessarily mean that you will detect it or that system you built will detect it. I don’t know how much of this exists, but it would be really great to have anomaly detection on the Bitcoin network so that if there were something that happened with chainalysis, it would be caught quickly.
Question: When exactly was this? I think this was 2015, early 2015, maybe 2014. At the time, I was obsessively reading my bitcoind net log file, and I added some extra debugging information in there, and I was processing those logs all the time, so I was really looking at it. But if it happened today, I wouldn’t. I don’t observe it that carefully.
I assume some exchanges probably have some anomaly detection. I know at my company we have a whole bunch of alarms, and they’ve caught weird stuff happening on testnet. I don’t know if anyone’s built anything that, but I think it really interesting just to be something funny is happening on the network. All of a sudden there’s now all of these new connections. Especially a targeted attack you’re you’re definitely going to miss that if you’re not the target, but something that’s targeting the Bitcoin peer-to-peer network as a whole. Some sort of DDoS attack or some connection exhaustion DoS attack. Also, if an attacker is aware of your anomaly detection, they can try to shape their attack to bypass that. And maybe there’s attacks which are really hard or impossible to do subtly. If you’re just going to connect over tor and get a bunch of tor exit nodes to spam the Bitcoin network. There was this one attack where the attacker wants to eclipse people talking to Bitcoin over tor. So what they do is they set up a tor exit node that talks to Bitcoin. And then, they talk through all the other exit nodes and get those exit nodes IPs banned on the Bitcoin network, so everyone has to talk through their exit node. And then you can eclipse it. So this is from a paper from 2015 or 2016 called Bitcoin over Tor isn’t a good idea or something that that. I don’t know how effective that attack is today. But know in that case, you would notice all of a sudden a whole bunch of nodes on the network has banned the same list of IP addresses, and you might be like, “why?” For the attack to be successful, you’d want to have a majority of nodes which accept incoming connections to have been those exit nodes. Probably for a 50% attack chance, you want 50% of the nodes to have banned them.
DNS Attacks
There are DNS attacks you can do. As we already discussed, when you first connect to the network, you don’t have anything in your peers table. You have to get IP addresses from the DNS seeders. If an attacker were to control some of the seeders, they could try to perform an attack. An attacker could also just control your local DNS server. So you could just happen to connect to a malicious DNS server, which could happen if you were connecting from a cafe or something. Or you could do a DNS cache poisoning attack. I don’t know if anyone has actually tried this on Bitcoin or looked into it to see how easy it is. I’ll explain this more. The seeders crawl the network to find out what IP addresses to advertise. You could attempt an attack where you just tell the seeders about just your IP addresses. I don’t know how easy attack 4 would be where you stuff the seeders. I believe the seeders have been designed to be resilient to it, but I don’t know if anyone’s in-depth attempted to attack the seeders.
This was an attack I did a while ago. DNS can return about 4,000 IP addresses. So most of the seeders, the honest seeders, were returning 160 IP addresses. You could have an evil seeder that would return 4,000. Even if only one of the seeders was malicious, they could send way more IP addresses than the good seeders. However, this is fixed now. And regardless of how many IP addresses are returned, Bitcoin now limits this to 256 IPs per DNS for this reason.
DNS is UDP normally, but DNS supports the behavior where you can say, “I’ve got more IP addresses than that, make a TCP connection to me or make an additional connection and I’ll give you the rest.” And because, at least at the time I tested this, Bitcoin relies on the underlying OS to do DNS resolution, that just did that. So I made a malicious DNS server and then sent 4,000 IP addresses.
…
There’s a little bit of that it’s already built into Bitcoin. When you write something to the new table, you limit the buckets that can go into based on the slash 16 of the announcement. So that prevents a single IP address from completely flooding the new table, which was a really clever mechanism. I remember reading it and being like, “oh!” At first, I was like, why are these buckets here, what is this? What are these net groups now? When I actually figured out why they were there, I was like, “oh, that’s really smart!”
So if you control the local DNS server, you could do direct DNS cache poisoning. When someone makes a DNS request, and the DNS server responds, it responds with this value time to live, which is also known as TTL, which is how long to store these results on your local machine. What someone could do is, you’re connecting to the Bitcoin peer-to-peer network for the first time from you know some Wi-Fi hotspot. Actually, let me try that a second time. Let’s say you walk into a cafe. You haven’t installed Bitcoin on your laptop yet, and you talk to the cafe’s Wi-Fi. It makes you open a page, and hidden in that page are the DNS addresses for the seeders in an iframe. So your laptop now makes a DNS query to the local DNS server about who that is, and the local DNS server responds, “oh, it’s just these IP addresses, and it puts a two-year TTL.” You’re going to cache that result later when you do a DNS lookup. I’m not actually I know Bitcoin has some of its own DNS code, and I know it differs by operating system, so I haven’t tested that particular attack, but that is a general attack on DNS that people have done for malware distribution. You can poison someone’s DNS cache ahead of time using an approach that. So when they do start it up, they just have these DNS results.
Or when the DNS server is looking it up, you could say poison it at this point. So this DNS server will cache things. So if you can inject packets in here, you can poison the DNS cache here, and then this will announce it to everyone. So if you manage to poison Comcast’s DNS servers with the Bitcoin seeders, then everyone who created a fresh Bitcoin node would get the only attacker IP addresses, and you’d eclipse them all. In regular DNS, this stuff is not protected, so as an off-path attacker, you can inject packets, you can spoof packets, so at least in theory, it should be possible to cache poison a DNS server. DNSSEC is designed to prevent this and has been somewhat rolled out, but it’s not rolled out universally. The high level from this is there’s a whole bunch of really bad things you can do with DNS, and a lot of these things are not under your control.
… You could not use DNS for this, and I believe some other cryptocurrencies don’t use DNS. My understanding of the reason that DNS is used is because you only talk to your local DNS server in many cases. If there were just one server that the first time a Bitcoin node booted up you requested from, then the access logs to that server would be everyone who runs Bitcoin core. Whereas DNS provides this layer of indirection that kind of hides who is asking queries. I don’t know how good that actually is from a privacy standpoint, but even if you don’t, the idea is that it does provide some benefit, there’s not just a server out there that’s a list of everyone who does it. That’s why they use DNS verse a TLS to some server that gives you a JSON list of all the IP addresses.
…the hard-coded IP addresses. How are those generated? And how many of them were online? My labmate was doing these really fun UDP fragmentation attacks for injecting traffic as an off-path attacker and doing DNS cache poisoning attacks you can just inject. You request it from some resolver, and then it’s sent out a packet, and you do a fragmentation attack to inject your packet without being on-path or in-path. Because it’s UDP, it doesn’t have the same protections. So one idea to solve this is that you could try to authenticate the seeders. Some of the DNS response could actually be a signature on the other IP addresses. So you could encode a signature in the first 2 ipv4 addresses or the first couple ipv6 addresses. So you could make this signed data and sort of just shoehorn in sign data over DNS. Since you’re already hard coding in the DNS seed, you could also just hardcore code in a pubkey for that DNS seed. Or you could look at turning off DNS cache poisoning, so you always make a new request. I haven’t looked at the DNS code in a while, so maybe cache poisoning was turned off. I don’t actually know that this isn’t already fixed. But it’s an interesting thing to look at. If you’re asking DNS seeds, you should almost definitely not be using whatever cache you have locally. You should at least be going off to your local resolver. You could actually just call off the DNS through Bitcoin, but almost all applications rely on the OS to do that for you.
Conclusion and project ideas
So I feel talked about all of this stuff. In general, some project ideas things that I think would be cool if people worked on. One was hardening the seeders against attacks, thinking about how to make on-path attacks harder. UDP has some interesting properties for on-path attacks. So off-path UDP is usually the easier to spoof packets because it’s stateless, but if two parties send you UDP packets, you can’t necessarily do these reset attacks where they just break the connection, and the person will drop packets. And then, incoming connection logic, is there any way to game this? How optimal is it? Is there some fine-tuning that would make it more secure?