Lightning Network Security Analysis
Transcript By: Bryan Bishop
Security analysis of the lightning network (2017)
Olaoluwa Osuntokun (roasbeef), Lightning Labs
My name is Laolu Osuntokun and I work on lightning stuff. I go by roasbeef on the internet. I work at Lightning Labs. I am going to be giving an overview of some security properties and privacy properties of lightning.
State of the hash-lock
Before I start, I want to go over the state of the hash-lock which is the current progress to completion and design of lightning. We have a set of specifications called lightning-rfc where a bunch of collaborators get together and work on lightning. You should be implementing lightning fully from the specs. It goes through everything including the funding process, key derivation, p2p interaction, messages, etc. We have multipe implementations being tested on bitcoin’s testnet currently. There are four implementations. I work on lnd, one of many. There’s a testnet lightning faucet right now and this is basically like a way you could get some bitcoin. Instead of getting on-chain coins, you get a channel opened up with you and it’s kind of a cool way to get started with lightning because you go to the website, you have the node up, you get a channel setup, and then you can use that channel to make payments on the lightning network. This gives you a gateway to the network and using that gateway you can send payments around to whoever else. Another cool thing that happened recently is that litecoin is going to be getting a malleability fix so lightning can work on litecoin. Also work is occurring for trying to get lightning to work with mimblewimble- that’s alchemy, really cool stuff.
Alright. So here’s my brief outline. I’m going to do lightning in two slides because we’re going to be moving kind of quickly so I figured I would explain lightning in two slides. Talk about some assumptions about the liveness of lightning, then I wil go into the peer-to-peer networking layer of the system, and how we have privacy within routes themselves, how hash-lock decorrelation works in lightning (something of a bug that we plan to fix), and then I’ll talk about something called blinded channel outsourcing.
So lightning in the first slide… basically, this is the way that bitcoin works currently, right. You basically start with Alice and Bob. Alice wants to pay Bob. They all connect to the bitcoin network. Bob gives Alice a bitcoin address. And then broadcast that to the bitcoin network. Even though this interaction was between Alice and Bob, in bitcoin the entire network needs to be involved. Miners get the transaction, they confirm the transaction, and then it gets broadcasted to everybody. Even though Alice and Bob were the only people involved in this payment, every single person in the network has to do this work. This is inefficient. We have to involve an entire network just because of a private contract between two people.
There are some issues with this because now every single payment that is completed will be added to the blockchain and increase the size of the blockchain. It takes up bandwidth because everyone has to copy it to everyone else. Another thing is that confirmation times are kind of unpredictable and not instantaneous. There’s a randomized schedule by which blocks get confirmed on and we don’t know how long it is going to take for Alice to send the money to Bob. They could do a zero-confirmation bitcoin transaction but that could be risky.
Inherently, global broadcast doesn’t scale. If we want to have millions and millions of users, we can’t have this scheme of connecting out to everyone else Lightning is a solution to this problem. It’s basically a system for off-chain payments using off-chain contracts that are anchored in the bitcoin blockchain. Using these off-chain contracts, we can make the system more efficient and more private because all of the information isn’t on the chain. In bitcoin without this, there’s an entire transaction graph available from the blockchain that anyone can trace through, while with lightning we have these– everything is more efficient and off-chain.
Basically we have to make a funding transaction where Alice and Bob put money into the transaction contract itself and from that setup it’s locked in that state. But before we actually broadcast the funding transaction, and this is important, we also create a commitment transaction which delivers the final state to Alice and Bob. Without this, Alice and Bob wouldn’t really be able to get their money back to each other. A malleability fix is really useful here because if we didn’t have one, then the funding transaction could be malleated and the commitment transaction would be invalidated meaning that Alice would have some random thing and money gets stuck.
Once the funding contract has been confirmed, we move into the off-chain payments state. We use something called a hash-time-lock-contract (HTLC) which is similar to something that was presented at this conference the other day, basically a claim and refund process. Basically Alice creates a conditional payment to Bob based on Bob claiming the payment with a certain preimage. And if it times out, and Bob doesn’t present the preimage within the right time window, then Alice gets her money back. At the end they have something called a closing transaction, which commits the final state to the bitcoin blockchain.
The cool thing about this is that the on-chain footprint is really minimized, we have only two transactions in the ideal state. All the updates are point-to-point between two individuals and we don’t need to involve the entire bitcoin network for every single update. We also have reasonable fees in comparison. We’ll see later how the fees are predictable and how that works.
So that’s lightning.
I’m going to talk about the difference in security models between lightning and bitcoin. So, lightning actually uses bitcoin but it could be used on many other chains as well. And we basically rely on bitcoin for the ordering of transactions. This is basically so that we know which transaction happened before and after, and we can base our protocol on this.
There’s a new component in lightning which is a time-based component where there’s a certain window of action. One of your counterparty defects (channel breach)? Then you have a certain amount of time where you can act and bring the state back to your favor even though the counterparty defected from the original protocol. This is a new component. You can configure this time parameter, it’s a relative delay, and if you have a longer delay then that means you have more time to act in order to bring your counterparty to justice. The other thing, though, is that when you increase this delay there’s something called unilateral close which is basically you just broadcast the transaction; because of some security mechanisms in there, you also have to wait longer there. In the optimistic case, where basically everything is fine and everyone works together then you can increase the value of this T security parameter (CSV delay) and this allows you to have more safety in the rare case that someone tries to cheat you.
There’s some failure modes in this design. One is called a thundering herd failure mode where maybe there’s some kind of issue in the client or a bug or something happens on the network and everyone tries to close all of their channels at once. Because the chain only has a finite capacity then we can only get so many closures in during a period of time. As soon as the commitment transaction gets into the chain, then the clock starts ticking. So if you’re not able to get your transaction in before the clock expires and they can take your money, then you could potentially- the adversary could potentially profit but this depends on the time value of T which is unique per channel, the backlog, and how the attack is going. So this T can be 1 week, 2 months, it can be as long as you want, and this is your safety mechanism. It’s probably going to be unlikely that you won’t have access to the blockchain for like 1 month because that’s basically a massive attack on bitcoin and maybe you have like a $1 channel, but we’ll see about that.
There’s a few proposed solutions. One possible solution is called “time-stop” and this was proposed by gmaxwell and Joseph Poon and it’s basically allows you to stop the ticking clock when the block reaches a high water mark. So basically if a block is over 75% capacity or 80% capacity then the time stops ticking and this gives you a security margin where if the block isn’t really full then we can continue ticking the clock forward.
Another possible mitigation that I was kind of talking with people here earlier today is you can create a consensus-enforced dependency on the commitment transaction and the exit transaction. This means that they must get into the block at the exact same time. You might be able to use generalized covenants or, Bram had some ideas about a new opcode that could make sure the transactions themselves are kind of dependent on each other. Another way is to possibly set the fee structure on the commitment transaction and the exit transaction such that if the commitment transaction has insufficient fee and you have to use the gesture transaction to pull it into the chain itself. This doesn’t work if the attack is performed by miners, but it could still be effective.
Peer-to-peer networking layer
Moving along to the p2p network layer. All connections between all nodes in the lightning network are always encrypted and authenticated at all times. They are encrypted for the entire duration. Nodes never send cleartext protocol messages. They don’t give away version information or anything like that. We use something called brontide (BOLT 8). It’s one of the specifications. Brontide is a variant of the noise protocol framework which is used by Signal and Whatsapp and some other protocols. We tweak it a little bit to suit our needs. We also have a protocol for authenticated key agreement, it uses a series of hash functions and ECDH to do this triple diffie-hellman thing to derive shared secrecy and keys. We use something called authenticated encryption with associated data (AEAD) which wraps up authentication and encryption all in one. We can get some cool security properties from that. The other thing about the noise framework is that it solves this kind of man-in-the-middle-attack when you’re sending keys over the channel because every single payload sent over the channel is actually authenticated. So if someone tries to swap out a key that I’m sending, then the MAC check will fail and then we won’t move forward. In the noise framework, you can have parameters on which cryptosystem you’re using for encryption, hash functions, and which group you’re using. We use libsecp256k1 which is the bitcoin curve, we also use ChaChaPoly, and sha256 for our MAC. On the right in the diagram, that’s the packet structure. We do some things like we encrypt the message length of the packet as well, which you could say maybe makes traffic analysis more difficult because now you can’t– normally delineate what the packet size is after the keys itself. We also rotate keys every few messages and this allows us to have backwards secrecy in the sense that if my keys get compromised then they can’t go and decrypt every single message that I sent in the past before that.
Now more about the network itself. In lightning, nodes are identified by their public keys. These keys along with bitcoin keys are used to authenticate certain information that is sent over the network. The first thing that we have is a node announcement message that says hey I’m on the network, here’s my public key, here’s my reachability information, here’s a signature over that information to prove that I actually own the key, and then during that the node advertizes what we call “global features” which are basically like this is the type of HTLC that I support, maybe we can also use this to phase in new features into the protocol itself.
Next we have the channel announcement (channel proof) message. The channel announcement is a channel proof. We want to stop adversaries from being able to flood in bogus data. So what we do is that we ensure any channel we see we always authenticate it. In order for it to be authenticated, we need them to present us a proof, which tells us where the channel is in the chain. This is finding the outpoint and what’s created in that outpoint. This involves 4 keys (two multisig keys, two node keys). So you find the outpoint in the bitcoin blockchain, you then verify that this scripthash script actually those two keys in the announcement. We have four sgnatures. Two of them are from the ones in the chain to prove that you can control and do updates in the chain. The other two signatures are the signatures of the nodes themselves to prove that we have this relationship and that we actually have this relationship with this chain so there’s a 4-way relationship that is going on.
Right now we have four signatures, which could be compressed, and we could use things such as Schnorr signatures and maybe combine signatures together or we could do it on a global basis if we were using a pairing-based signature scheme which would allow us to compress all of the signatures into a single signature.
Finally, we have the channel update announcement message. Within lightning, it’s a graph with channels and the channels connect to different nodes. But it’s actually a directed graph because you can float funds in one direction or the other direction. Each node controls which direction it can allow funds in. I can not even advertise my channel, or I could advertise certain things like my routing policy which includes timelock, fees, and what type of payments I accept. This is all signed. So what we have is basically an authenticated data structure of the channel graph itself. Nobody can forge any updates to the channel graph, and every node can use the blockchain and all the updates and we can ensure that we all have the same view of everything.
Onion encoded payment routes (sphinx)
In lightning, we actually try to make payments more secure because if we didn’t make them secure then possibly you could have censorship in the network itself where people drop payments or maybe some adversary would be able to collect data. We use sphinx to do onion routing in the lightning network. Onion routing is where you have some data, you wrap it up in encryption a bunch of times, and then using that, only if the decryption proceeds successfully can you get the actual data out of it. This is specified in BOLT 4 and it’s a fixed-size payload.
This gives us a number of cool security features. When a node gets this packet, and because it’s fixed size, it doesn’t actually know its position in the ultimate route. It doesn’t know how long the route is because even if it’s just a 2-hop payment, the system will always encode it to the full size of the packet. And the nodes only know their predecessor and the successor between them. And because we randomize the packet at each hop, the packets are indistinguishable from each other. You got a packet, you don’t necessarily know that it’s the one that you got before, we because we re-randomize them.
We also do a cool thing where we re-use the shared secret in the onion circuit to encrypt error messages back to the sender which helps us compartmentalize who knows what data. Maybe if you know why the payment failed, and you were an intermediate node, then you get more information about the system.
Onion encoded payment routes
Getting into a little bit more of how we use sphinx in lightning… We use something called source routing which means that the sender can currate the entire route that the payment takes. This is really good because it gives the sender total control about the route. They know how long it may take, they know who they’re sending it through, and they know how many fees they will be paying, and this is really good because you can have predictable fees. One of the ways that we extended sphinx is that we added this per-hop payload. This is authenticated end-to-end. This payload does– the payload gives information about how to forward it to the next hop. It gives them basically what timelock they should use, the amount they should forward, ensure they get the proper fee, and say you got this on the bitcoin link but forward it to the litecoin link. All of this information is authenticated and no node is able to tamper with this.
Another important feature we have is protection from replay attacks. Without replay attack protection, someone could use an old packet from an old payment and inject it back into the network and see where it propagated at. We have a few things that we use to prevent that. One thing is that we commit to the payment hash inside of the sphinx packet itself. You can’t take the sphinx packet and add another payment hash to it. If you wanted to try to replay, you would have to re-use the payment hash, and if you re-use the payment hash basically if people remember the payment hash then you lost money. Another thing that we can do is that we can have a log of messages that we have received. Because HTLCs have a timeout, we have an upper bound of when they could forget about that message. So say you keep it around for 1 day and then after that day you abandon it and then you know things are okay. This isn’t perfect because the network is still subject to timing and traffic analysis, meaning if we could watch it and monitor it and look at packets moving around then we would be able to ascertain some information about the payment path. The other thing is that if we have poor path diversity, meaning that the graph isn’t very well connected, then an adversary could figure out where a payment is going. Where if Dave is on some island and connected with only a single link then maybe we could know that I’m forwarding it to Dave or possibly if we have different payment channel capacity amounts then I can rule out which possible paths a certain payment is going to go on.
This is kind of like one thing that is not fixed in the current 2017 version of lightning. We know how to fix it. It’s something called hash-lock correlation. This is a path. We have 5 nodes. Alice, Bob, Dave, Carol and Eve. Pretend that Bob and Dave are collaborating or colluding or they are the same person. Because the hash-lock is the same throughout the entire route, Bob and Dave know that oh this is that same payment that I got from Alice and using this they can get more information about it. This kind of mitigates or unlinkability because they know this is the payment they received and they can get more information.
But we know how to solve this. The problem is that the payment hash of the hash-lock is the same for the entire route. So if we could basically decorrelate that by re-randomizing it, similar to something that sphinx does. In sphinx, it has a very compact size and achieves that by using the same group element of the public key, and every node randomizes their public key along the route. This is one construction that was recently brought to the mailing list to switch from hash-locks to “key-locks” for path length y. Some people suggested using SNARKs and that’s really heavy and it could take seconds to generate it. There were some other ideas like one-time use signatures and so on. But here’s a version here I call “key-locks”. Instead of using hashes, we use public keys to propagate payments itself. Pretend that Q is the key you received on the incoming HTLC. It’s in the payload too, the payload would be extended to carry this information. And there would be a scalar called R which is also in the payload. Using Q and R you could get a point P and because of this mathematical property in elliptic curves themselves you know that Q and P are related by some scalar value. You know the private key for Q is actually P + R so you know if you get the private key for P then you could claim the incoming HTLC with Q itself and that’s basically how it works in a nutshell. Bob sends out P on the outgoing payment. After it gets the private key for P, it can compute the private key for Q which is Q = R - P. And that re-randomizes it and lets us eliminate this correlation. One interesting aspect is that this enforces a causal dependency on the route itself where before if anyone found out about the hash-lock ahead of time they could still claim it. But now Dave must claim it, then Eve, then Bob, then Carol. So this is still a property that needs to be analyzed I guess.
Blinded channel outsourcing
The final thing I want to talk about is blinded channel outsourcing. In lightning, you are required to watch the channels at all times. Well, maybe not all times, depending on how long your value of T is. But you need to watch the chain to watch for your counterparty or adversary trying to cheat you essentially. So there’s some schemes proposed that is based on secure hardware like teechan on SGX. Every single output to yourself has this relative time-lock (CSV) delay. And this delay is there because you might be trying to cheat your counterparty by broadcasting a prior state, so the protocol makes you wait, and this period acts as an adjudication period where the chain doesn’t really know who is in the right or wrong. It’s up to the participants to give the chain the information to prove that Bob cheated in this particular way or whatever.
Thankfully this can actually be outsourced to a third-party. You can send then some information and you can tell the if you see this on the chain then they could act on your behalf, clean the money, and help you out. So currently it could be more efficient, if we had SIGHASH_NOINPUT or MAST we could actually make this order log n because right now we need to send signatures for every single state. But if we didn’t include the public key script itself in the signature then we could use only one signature. We could also use things like MAST to hide the script and make it more efficient.
We can’t make it super-efficient, but we can make it more private. This is some stuff that Tadge Dryja came up with. He calls it blinded channel monitoring. What we’re trying to achieve is that even though the outsourcer is watching our channel for us, we don’t want them to know exactly which channel they are watching for us. Maybe they’re colluding with our channel counterparty and they realize this is Laolu’s channel state and then they delete it because maybe they have a grudge or they are adversarial in some way. Instead, we randomize the data that we give the outsourcer, such as the public keys, such that the channel outsourcer doesn’t actually know which channel they are watching on the chain. The cool thing also is that because we use this “reverse” merkle tree structure for the revocations which is kind of like a secret which when revealed allows me to take a prior state. Then we can collapse the state a little bit. It’s still o n linear but at least we can make it a little bit more efficient.
Some things about channel outsourcing. Why would the outsourcer want to do this at all? Maybe we pre-pay them with a dollar or two. Maybe that will account for their disk space and bandwidth because I’m only doing 10,000 state updates over an entire year. Maybe we could give them a bonus where if Bob tries to cheat me and you catch him then you get some bonus and that incentivizes them to do this. One problem is sybil resistance where if we don’t have any authentication that we know it’s a real channel then someone could give them garbage data and fill up their disk. But there’s one possible solution which is that we could require the user to give the outsourcer a linkable ring signature which says here’s a bunch of channels we know are on the chain and I’m going to give you a signature that proves that I am one of those channels. With this, the outsourcer could know actually okay this is a real channel in the chain with something at stake. The linkability of the linkable ring-signature allows the channel outsourcer to know if someone has signed it twice, so they know they can reject that data or information and this ensures that there’s only a unique user per user. And that’s everything.
Q: I feel like I just got three PhDs. Can you talk a little bit more about the sybil attack resistance?
A: For the channel outsourcer?
Q: On the previous slide.
A: Yeah. So this is just like…
Q: Ring signatures with respect to that?
A: As a node, we’re assuming the outsourcer is a like a lightning node itself. A node knows all the open channels. A naieve way would be the user would say hey that’s my channel but that defeats the purpose of the blinding so instead we could have the user tell the watchtower(?) hey here’s a hundred channels on the chain which haven’t been claimed yet, I’m one of these channels. Give them a large signature. This proves in a privacy-preserving way that the user has a real channel on the network.
Q: Is that similar to the way that SPV wallets work with doing a bloom filter that say I’m going to look at these transactions but you don’t know which one I’m looking at?
A: Bloom filters don’t give users much privacy at all. This is instead more similar to like in monero they use this for hiding which output you’re spending. It’s the same tech you would be using there. I own one of n of these channels. But if I try to say I own this channel twice, it’s linkable, so I can say that I’m dropping that state because I’m already watching that channel.
Q: I think you went over this already, but how is it that you’re able to prevent timing attacks?
A: We really can’t. Right now we don’t try to address them. We could try to fake traffic itself like a ping-pong reply. We could have a pong with x bytes of data, and this could fake traffic and interleave real payments between that. The software would be more sophisticated and there’s literature about how hard this is to do. You could send dummy traffic around and interleave your real payments within the dummy traffic.
Q: How does your approach differ from the raiden network?
A: I’m not really sure what their state is currently. We have specifications. I don’t think I have seen them publish anything. It’s based on a hash-lock, and theirs is a little bit different because they have the ethereum network and they can load lots of weird logic into the contracts themselves and maybe have different assets or different things like that. Some of that is replicable in bitcoin, like if you use private contracts off-chain. But yeah they are very similar and it’s feasible that the networks could be connected, to connect bitcoin and ethereum using things like this.
Q: Version updates.. security model.. in the future?
A: We have– when a node announces its state, it has global features. When you connect to a node, it announces local state. And inside of that message it’s essentially a bit vector so we could set bits to denote what I support. So maybe I support some new channel model and then I flip that bit. It makes it incompatible locally but still globally. We can use those features to gate in new behavior without shutting down the entire network and making everyone upgrade at the same time.