Home < TABConf < TABConf 2022 < Lightning is Broken AF (But We Can Fix It)

Lightning is Broken AF (But We Can Fix It)

Speakers: Matt Corallo

Date: October 14, 2022

Transcript By: natan-del-prado via review.btctranscripts.com

Tags: Lightning

Media: https://www.youtube.com/watch?v=s9KMRWkcwtE

Introduction

Thank you. Yeah, so, as the title says, for those of you who work in lightning or have been around lightning for a long time, hopefully you will recognize basically all the issues I’ll go through today. I’m going to go through a very long list of things. But I’m going to try to go through most of the kind of, at least my understanding of the current thinking of solutions for a lot of these issues. So hopefully that most people will at least have a chance to learn something or at least get some better idea of where things are going, even if you already know all the issues. Some of these issues get a little complex. I’m going to try to explain them the best I can. I’m not always the best at explaining things. If you’re confused, please do interrupt me. Throw something at me. Clearly throw really loud. Just yell a question. I’m okay with all of that. I want to make sure everyone is on the same page at the end of this. So feel free to interrupt me as we go rather than holding your questions to the end if I’m not making something entirely clear.

Hum! So yeah, I mean, you know, Lightning is obviously maturing quickly. People look at the adoption curve. There’s all kinds of exchanges and people are actually paying for Lightning now. The UX is actually getting kind of good. You can pay for stuff with Lightning. I prefer to pay for stuff with Lightning at this point. Not because I like Lightning because it’s actually easier now. But a lot of that is just the adoption curve. Development continues to move. There’s a lot of people working on Lightning. A lot of brilliant people working on Lightning. But Lightning has gotten big and it’s kind of hard to make progress very quickly. And so it moves at the rate it moves because we don’t want to break a live running system. But there’s still lots and lots of issues. We’re going to talk about a bunch. And we have to improve these issues but without breaking things. And that makes Lightning development still kind of early even though the adoption is showing a lot of progress.

Devices are Always Online

So, you know, everyone who’s been around Lightning knows that Lightning requires your devices to be online all the time. You’ve got to check the blockchain every now and again. You’ve got those messy timeouts and stuff. You’ve got to be able to go to the chain. Early in Lightning, everyone decided that this was fine. We’re going to solve this by putting your node on your Raspberry Pi at home. This is a great model. It’s really cool. You keep your Raspberry Pi plugged in all the time. Your internet never goes out at home. I don’t know where you guys live that your internet never goes out. But hey, whatever. And you keep your Raspberry Pi at home. It’s great. And then you’re in Florida and you’re in a hurricane. And suddenly your Raspberry Pi is offline. And someone steals your money. But it’s okay because you lost your house too. So you just wanted that extra pain and suffering that week. Okay, so well, maybe we won’t put it on a Raspberry Pi. That’s a thing for some people. For others, maybe it’s too much. Maybe let’s put it in the cloud. There’s these great cloud service providers. You can pay them some money every month. They’ll keep our servers online. Until somebody trips over a cable at US East. And then the entire internet is down. So at least you’re not alone. But you just lose money. Or maybe you used, I don’t know, there’s a bunch of bad services. But someone contacts support. Tells them they lost their account. And they steal your money because support helpfully lets them into your account. This has only happened a handful of times. But you don’t want to be that guy. Okay, well, maybe let’s not put it in a cloud then. The cloud kind of sucks.

Pesky Users With UX Expectations

Okay, well, it turns out most users like mobile phones. Mobile phones are great. Everyone actually wants a normal experience. What they’re used to. Users kind of want this experience that they’re used to. They install an app. And suddenly they have all the stuff works. It’s in the app. It turns out all these things are actually just like clients to a server. And we didn’t want to go custodial. So maybe that’s not great. But maybe we can make apps work. Maybe we can get the lightning on the app. And we’re not going to have any drawbacks there. It turns out users sometimes go hiking. Like out in the woods. I don’t know why you would leave your basement. But some people do it I’m told. And then their phone’s offline for three days. And they lost their money again. Or actually, it turns out this problem is more nuanced. It’s not this simple. There’s two issues here. There’s the side that you have to be online. You have to check in every day or week or what have you. You can change the times. 40 was the number until the last week in L&D. But now it’s getting higher. So you’re going to have to check in only once a day maybe. So, okay, maybe we can do that. Maybe the phone is probably online once a day. It turns out you also need to be online to receive the payment. So right when that payment comes in, you want to be online to actually see it. That’s really great. On the phone, you can get push notifications. Your app can run. Oh, no, actually, it doesn’t work that way. So phones are very conservative batteries. The apps you open all the time, if you’re always on your Bitcoin wallet, you’ll probably get CPU usage when the notification comes in. The notification will come in. The app can wake up. It can do some work. It can interact with the HTLC. It can claim the payment. All good. If you only open the app once a month, the phones, especially iOS, really don’t want to give you any CPU time. They’re just, yeah, the user doesn’t use the app much. CPU wastes battery, especially network access wastes battery. And so that would be really bad user experience for users that their battery gets drained. I mean, we are paying and giving them money, but we can’t drain their battery. So it turns out you actually can’t do that a lot of times. So it’s a nuanced problem.

Async Payments

There’s kind of two sides of it. So, okay, well, sending, I mean, when you send a payment, you probably have the app open. So I guess that’s not actually a problem. But when you receive a payment, that’s a problem. So in Lightning, we’re sending these payments. There are multiple hops. Hopefully many hops for privacy, four or five hops, but often less. But you’re sending a payment. You’re sending these HTLCs, right? So along the entire path, you have to lock in money. You have to commit some portion of the channel’s total available resources to this pending payment. And that phone, in theory, needs to be online to say, hey, now I’ve cleared the payment, and now you’re not holding these resources hostage. If the phone’s not online, you’re holding the resources hostage the whole time. This is terrible. Okay, so what do we do? Well, there’s this, we’ve now rebranded it async payments, is what we’re going to call it. There’s this old proposal that I had to solve this problem, kind of. The intuition is basically just the phone tells the sender when it’s online. It’s not that complicated. Well, okay, it’s not that complicated in theory. In practice, it’s very complicated. So we actually need layers of things we need to build to get to the point where we can do this. So we need Onion Messages. Onion Messages is a really great platform. How is the receiver going to tell the sender when it’s online without revealing who it is? Yeah, that’s hard. But we have this great Lightning network. We have all these nodes. We can run payments through them. So we should use Onion Messages. We should be able to send messages over Lightning. They’re finally coming. I think Eclair has it live, turned on by default. Core Lightning has it live, but you have to turn on a flag because it’s still kind of beta. But they had it first, so you can’t give them too much shit. LDK now has it. We’re working on some of the finishing touches for it. So we can forward messages. So we have to send these messages to say, “Hey, I’m online. Send me HTLC now.” Okay, well, that’s great. But what about sending? So the user opens the phone, sends the payment, and then the payment fails because there wasn’t enough liquidity on the route. And then they’ve closed the app, put it in their pocket, and oops, now we can’t retry. Well, that sucks. Okay, well, how do we retry? Well, we have Trampoline. Trampoline’s great. Eclair’s been doing Trampoline for, I don’t know, forever at this point. And you send the payment to the LSP, the immediate first hop. You tell it where to send it. Oh, wait, crap, we didn’t want to tell the server where we’re sending all our payments. Bad for privacy. All right, well, we’ll do this Trampoline thing, and we’ll have multiple Trampoline hops. All right, we’ll get there one day. Okay, well, that’s one more thing we need to build for async payments. We’ve got to have Trampoline so that we can send it to the first hop and tell the first hop where to route the payment, and it can retry for us, even if the phone’s offline. Okay, well, that’s another thing I have to build. Okay, what about, so users really love this user experience that they get with Venmo, that they get with Cash App, that they get with all these other things. You get a QR code, and you can put that QR code on Twitter, and then people can just scan that QR code and send you money. It’s a really great user experience. People love it. Lightning should try this. It turns out we can’t do that because the invoices are single-use. That’s a problem. Okay, so we want to have multi-use invoices where the recipient can be offline when we want to send the payment, and then it just tells the sender when it comes online. Okay, so we need onion messages, we need Trampoline. Oh, single-use. We need PTLCs. So not only that, we need the sender to be able to create the payment hash, so the HTLC construction in Lightning. You have to have the hash and a preimage of the hash that the recipient reveals to the sender in order to declare the payment. There’s an old proposal to use Schnorr signatures, and now with Taproot we can actually implement this. It’s not done yet. But we can swap the hash and the preimage for what are called adapter signatures. You don’t really have to know how they work, but we can do magic in EC land, in EllipticCurve, and we can use actual signatures for it. And this actually allows the sender to say, hey, recipient, you gave me this awesome public key, but similar to the payment hash in Lightning. I’m going to actually add my own little bit to it, and then it’s unique, because we can’t use a payment hash more than once. The tag has to be unique, so the sender can add a little bit of information to it. And now we can get this great single QR code experience with offline receive, and we only need to build three massive features into Lightning to get it there, and then we also need a few follow-on things. We’ll get it one day, I promise. Okay, so that’s async payments.

Bugs Take Your Node Offline

All right, so we’ve solved the always online problem, right? Oh, wait, it turns out sometimes software has bugs. It happens to all of us. We’ve all done it, and then your node’s offline, and now you can’t force HTLCs. Oops, that’s a problem. Okay, so as we’ve all learned this week, your time locks in your Lightning settings has to be at least your response time, so when there’s a bug in your node, it has to be time for the developers to fix the issue, ship a patch, and you’ve got to apply the patch, make sure your time locks in Lightning are well-tuned to address that.

WatchTowers Fix All Problems … Unless You Have HTLCs

Okay, well, it turns out we have this watchtower, and watchtowers fix all the problems with being online all the time, I’ve heard. That’s what I’ve been told. Yeah, unless you have HTLCs. So if you have HTLCs, watchtowers actually don’t fix your problems. Oops. So watchtowers, there’s several different models for watchtowers.

Several Watchtower Models

So there’s the kind of, you’ve probably heard of watchtowers, they’re these super private things that always make sure you get your money back. That’s one model that doesn’t actually enforce all the conditions, but the super private version of watchtowers is great. You send it a list of basically, if you see this transaction on the chain, here are some other transactions to broadcast. That’s super awesome. It’s super private because they actually don’t learn anything unless they see those transactions on the chain.

Privacy-HTLC-Enforcement Tradeoff

Except that’s not actually the Lightning Trust model if you have an HTLC. So in Lightning, where you are forwarding payments, or you receive a payment on one side, you have to forward it back on another, if you’re a big routing node. If you go to the chain, if those channels go to the chain, or the upstream channel goes to the chain, and they reveal the preimage for this HTLC, clearing the HTLC and getting the money from you, on chain, you have to see that preimage, take that preimage, and hand it back to the node before you that you received the payment from, that you forwarded on to. Yeah, but watchtowers don’t do that. Especially since they want to be private, ideally, and you have to reveal information about your individual channels. You have to tell them, like, hey, this is my upstream channel, and this is my downstream channel, and the payment that I was forwarding had this payment hash, and it came from this channel and went to that channel, and here’s how you broadcast the currency for that other channel, and reveal the payment preimage, and that would be entirely non-private in any way, shape, or form, and the watchtower would learn everything about what’s happening in your channel. Yeah, well, I mean, hey, you can do that. So maybe large routing nodes should consider using trusted watchtowers, and should actually run their node as multiple independent machines that are actually running these kinds of pseudo-watchtowers that are actually learning everything about the routing node, but when you actually think about running a watchtower, as many of them are implemented today, they’re super private, they’re super great, but keep in mind that they can’t enforce HTLC state, so you might want to consider changing your max HTLC inflate limit so that you’re only ever exposed to a smaller amount of risk because you only allow a certain amount of money moving through your node at once.

Eltoo Fixes This?

Let’s see, so I’ve heard watchtower problems are all fixed with Eltoo, right? Watchtowers can’t scale, and they have to store a bunch of data, and Eltoo is this great new proposed, it requires actually some potential soft forks or consensus changes to Bitcoin to deploy, it allows us to store only one little bit of constant-sized data with the watchtower, and we don’t have to store all of this old historical state, we know watchtower data storage grows as the channel state moves forward in time, and it grows unbounded, you can never delete old stuff. With Eltoo you can delete all the old stuff, that’s great. Oh, no, it has the same problem. You still have this privacy HTLC enforcement trade-off, it doesn’t solve that problem because, again, the watchtower would have to know which payments were forwarded across which channels and how to close the current state of the channel, which means they always need to see the current state of the channel, which was kind of the point to avoid. All right, well, watchtowers are helpful, but they do some things.

Transactions and Mempools and Pinning, Oh My

Okay, all right, so, this one’s fun. Antoine, you want to explain this one? Okay, I tried. Okay, so this is the other common trope to beat. Transactions and mempools and pinning, there’s so many attacks here, it turns out the Lightning security model is, if something happens on-chain or if an HTLC is about to expire, not only do you have to be online, but you have to make sure you can get a transaction confirmed on the blockchain within some time period, some time limit. You have to, maybe you have to claim an HTLC with a preimage, maybe you have to force close the channel and do that, or maybe you have to force close the channel and then time out an HTLC, whatever it is, you’ve got to get a transaction on the blockchain within some time limit. So historically, early Lightning, we’ve actually fixed one of the problems on this slide, so that’s good, good progress.

Predicting Fees is Impossible

We started with this model where we had commitment transactions, right, so in Lightning you’re kind of always updating the current state, and the current state is actually a broadcast-able transaction that you can go to the blockchain with at any time, but you only update the fee on that channel, on that current state or that current transaction when you’re online and when both parties are online and both parties are accepting the new fee. But what happens if one party is offline and the fees are going up? And now you have a time limit and you’ve got to get that transaction confirmed, but the fees are going up and now the fee on that transaction is out of date, and suddenly you broadcast a transaction but it just doesn’t have enough fee. Well, you’re kind of screwed. So that was really bad. We can’t predict what the future fees are going to be. So we tried that for a while and that actually did cause some problems in practice, and so we added this thing called anchors. And so this is a commitment transaction. This is an actual commitment transaction in Lightning. It has that current state, that current transaction we can broadcast. The two remote HTLCs outputs are all the things that were already there. It’s your money, your peers’ money, and then the stuff for the HTLCs, all these current in-flight payments. Great, we know and love those things. We added these anchors, so these are additional outputs on the transaction that allow you to spend and potentially do child pays for parent. So there’s two solutions traditionally to the problem of, I have a transaction that doesn’t have enough fee, I need to get it into the blockchain now, replace my fee, right? So we’d like, ideally, to take the transaction, modify the transaction, assign a higher fee to it, and announce that. We can’t do that because these transactions are cosigned, you’ve got your counterparty and your counterparty might be evil, or they might simply be offline, so we can’t do RBF, but you can still do child pays for parent. So that means we create two transactions, one transaction spends the other transaction, this one has lots of fee, this one has very little fee, but that’s okay because Bitcoin Core can look at that and say, oh, if I include both of these, I get a still decent fee, but I have to include both in order to get this high fee thing. So Bitcoin Core is smart, you can do that, it’s great. This is available in most lighting implementations. We’re still working on it, I think it’s default in some, not default in others, and the logic to actually construct this transaction is actually super complicated, and so it’s still kind of a work in progress to make it super robust, but okay, great. This works in theory, let’s pretend that works in practice, I don’t know, I’m scared of cows or something. Great, great, great.

You Can’t know What’s in The Mempool

Yeah, and then you can’t actually know what’s in the mempool. So what if, let’s just spitball here, instead of you broadcasting your current state, your counterparty broadcasts their current state, and they do this anchor thing, and it’s really cool, except instead of lots of fees, they just don’t put lots of fees there. So now this whole multi-transaction package is in the mempool, it’s sitting there, and it actually doesn’t have lots of fees, so it’s still low fee and it’s still not going to confirm. What do we do? Well, okay, we’ve got this second anchor here, this anchor is our anchor, we’re allowed to spend this one, we can construct our transaction, but we don’t know what’s in the mempool. We don’t know that this thing is there. Mempool is not, there is no one mempool, there are individual mempools on different nodes, there are various things you can do to cause nodes to have different mempools. That’s fine, that’s how Bitcoin works, that’s why we have blocks. If everyone had the same mempool, we would never need to build blocks, because we all agree on what the current transactions are. Just throw the miners out. So we can’t do that, we don’t know what’s necessarily in the mempool of the miners, maybe we know what’s in our mempool, but that’s not so useful, we don’t know which transaction is where to construct our anchor spends. So that’s hard. Okay, so that’s already in the mempool, let’s just take our transaction that we had on the last slide, our version of it, our local commitment transaction, and let’s just announce our package, right? Let’s throw that out on our Bitcoin Core node, and that has higher fees, so nodes should accept that instead. Turns out Bitcoin Core just isn’t that smart. So there’s no way currently in Bitcoin Core to hand it two transactions at once and say, hey, I know you have some other transactions in your mempool, you should take both of those transactions and throw them out and replace them with both of these transactions. You can do it via RPC, but not over the peer-to-peer network, so this is what we need to solve this problem. Gloria, who I think actually just got on a plane and left, has been working on a proposal to Bitcoin Core to add support for being able to do that, for being able to hand it two transactions at once. Great, so once we get that, we’ll be able to do secure anchors, all problems with transactions getting into the blocks are solved.

DoS Limits Are Always Exploitable

Oh, crap, there’s other problems. What if instead of just doing little fees, they also did lots of size? What if this transaction is really big, like 100 kilobytes, and then there’s another transaction based on it that’s another 100 kilobytes, and there’s just a ton of crap in the mempool that our counterparty put there? Okay, so that’s fine. Our transaction is actually still higher fee rate. Higher fee rate, miners should maybe prefer that, we hope. And they should still prefer that, but Bitcoin Core also has anti-denial of service rules. So if you were to just start relaying our other package in Bitcoin Core, this would be a trivial denial of service attack on Bitcoin Core nodes just in general, because here they are accepting these very large transactions, relaying them onto each other, validating these large transactions, wasting a bunch of bandwidth, and then replacing them with very small transactions. So if you simply blindly accepted this with allowing our earlier transaction to replace our counterparty’s very large package, then you would open Bitcoin Core nodes up to wasting a ton of bandwidth and wasting potentially a lot of CPU resources with them not actually paying a fee. So Bitcoin Core has this concept that there should be no free relay in replacement rules or in any transaction broadcasting. So basically that you should have to pay at least one Satoshi per Vbyte for every Vbyte worth of transaction data that you propagate through the network. So that means when we replace this big package with smaller stuff, we need to pay for all of the relay of the big package in order to replace it, because otherwise it would be free relay. So that sucks, because now our counterparty can make us pay a lot of fee to replace their package, and we don’t even know that it’s in the mempool, so we don’t know that we have to do this, to have to pay a lot of fee to replace their package, and now our counterparty is just extorting us for money. So that’s not ideal either. Turns out Gloria, to the rescue again, she has another proposal to also address this issue via another change to Bitcoin Core, I think now called Transaction Version 3. So what we’re going to do is we’re going to say, okay, Bitcoin Core is going to add a little special utility for Lightning, we’re going to mark these commitment transactions that both parties sign off on as Transaction Version 3, we’re going to put a little version 3 on the top, and then our counterparty is not allowed to do this. So our counterparty will no longer be allowed to broadcast very large descendants of our commitment transaction, and suddenly this attack goes away. Sadly, so it’s important to note that any denial of service limits in Bitcoin Core will allow this kind of attack, or most denial of service limits in Bitcoin Core might allow this kind of attack. So it’s not just size, it’s also the number of transactions here, it’s also various other constraints that Bitcoin Core does to make sure they don’t get denial of service attacked, have these kinds of issues, and so we need really TX Version 3 to just say, you can only construct a package that looks like this, so very limited, and then allowing us to replace it. So hopefully that solves our issue, and we can finally get transactions into the blockchain, unless everyone is trying to get into the blockchain at the same time. I don’t know what we’d do there.

Large-Scale Closures

Yeah, so most of you have probably seen the old flood and bloom paper, there’s various names for this type of attack, different specific styles of the type of attack, but basically if you hold a bunch of HTLCs, a bunch of different nodes have to force close the channel all at the same time, and there’s only so much block space, and how do we get all those channels unto the chain? Yeah, I don’t know. So this is one of those longer term problems that we don’t really have a great solution for. There’s various proposals, you can have some kind of …, I guess maybe a hard fork in Bitcoin, where you can prepay for block space, and then you can prepay and have guaranteed block space for you in the future, that probably doesn’t work because you’ll just open up all the Bitcoin Core nodes to denial of service attacks, can’t do that. Maybe you can have some way to scale the time delays, right? So you have these delays in Lightning, where it says this HTLC expires at this block height, and we have to make sure we can get our commitment transaction into the blockchain before that block height, well maybe we can make these numbers scale, like if there’s a lot of the mempools backlog, or there’s a lot of stuff going into the blocks, maybe we can allow those numbers to grow a little bit. It’s not clear exactly how you do that. There are some proposals that they probably rely on other earlier changes to Bitcoin. All of these require some kind of Bitcoin fork, there’s not a way to do it naively. So that’s a much longer term concern, and unclear exactly how we should solve it, but probably at some point we’ll get some soft fork, I guess, hopefully, maybe someone has a good idea. In conclusion, currently just do not open channels with people you don’t know, that’s a bad idea, you might get your money stolen, or at least don’t open channels with someone you don’t know who is really motivated to steal your money and is going to write a lot of code to do it, because they can probably definitely steal your money.

Routing Payments Always Works

Many Nodes, Many Liquidity Strategies

Alright, so we’ve solved the blockchain problem, we can get transactions into the blocks, great. Lightning is this great source routing thing, so that means the sender of the payment picks the route that it’s going to take through the network, and this is really great for privacy, because we’re not just telling everyone along the route where the payment is going to go to, because that would tell everyone where our payment flow is, and they would be able to learn all kinds of information about who is receiving how much money. So instead we just say the node shouldn’t learn anything more than the place the payment came from and the place the payment is going to, and so we do this, you have to select a path. Users, it turns out, and I didn’t know this, users actually expect payments to complete quickly. I thought they were fine with 10 minutes, I wasn’t entirely sure why we were doing this lightning thing. They like 100 milliseconds, which I don’t, people are impatient I guess, I don’t know. But if you want those times, if you want that thing that the user experience people keep telling me they want, that first route has to work. Maybe the second route, maybe you can retry the payment once, but really that first route has to work. And so how do you pick a route that is going to work? How do you make sure that the route you pick through the lightning network is actually going to get the payment there, every node along the path has enough available capacity in the channel, and the payment is going to go through on the first try, and it’s going to happen quickly. There’s a ton of nodes out there. There’s these, like, Plebnet folks have done all kinds of different ideas for how to optimize the node, who to open channels with, how to run a node, blah blah blah, and it turns out a lot of them kind of like spawned a node and then got busy with life and forgot about it, and like all the channels are constantly saturated, and your payments will never work going through those channels. So probably avoid those nodes, I guess. I don’t know how you learn who those nodes are, right? So you start up a node, you open some channels, you’re like, I’m going to route this payment, it’s going to be great, I’m going to send it somewhere, and like half the network is nodes where the channels are all entirely saturated because they’re completely unmaintained, and so how do you pick a route that’s going to have high success rate? Because every time you take a hop, you have a chance of half or something that the payment doesn’t go through? Yeah, right. So maybe you download the scoring data from somewhere, maybe you ask a server, you say, hey, what’s the BOS score for all these channels, or maybe you ask a server, hey, which channels have high success rate, how often do these payments fail? Maybe you start probing a lot, maybe you just send a lot of probes out there, you try a lot of paths all the time, even though you know they’re going to fail, I mean, you’re going to waste all of the Lightning Network’s resources, and you’re going to denial-of-service attack if the Lightning Network can take the network down, but your payments might succeed, unless everyone else is probing, in which case you’re screwed.

Just Use Nodes That Rebalance?

Oops. Okay, well, all right. We’ll do some scoring tomorrow, we’ll get scoring data from somewhere, and we’ll keep track of which payments succeed and fail, and which paths our payments go down, and which nodes are reliably able to route payments. That’s great, but what that ends up measuring, this is in fact what most large nodes do, you know, most, sorry, let me step back, not most large nodes, most nodes do, they keep track of reliability of all the payment of all the channels in the network, and they just strongly prefer to use channels that seem to be reliable. What this ends up measuring is which nodes are actively rebalancing their channels, and keeping channels rebalanced, and then you just strongly prefer all of those channels, which is a whole separate debate. There’s an ongoing debate in the Lightning world about if rebalancing is good, or if rebalancing is in fact zero-sum. If you’re rebalancing your channels, are you in fact just pushing that imbalance onto other people’s channels, and causing them to be not able to route payments instead of you? It needs more simulation, it needs better understanding of the Lightning Network dynamics, in some cases yes, in some cases maybe not. So that’s not ideal either, but it certainly gets your payments to succeed if you just strongly prefer these nodes that are always rebalancing.

“Accept Channels From Anyone” May Not Work

Okay, but then we’re also only routing through these few nodes that are rebalancing, or these handful of nodes that are rebalancing, does that mean that our privacy is lost again, because we’re only routing over a very small subgraph, we’re not routing over the entire Lightning Graph, we’re strongly preferring other paths, certain paths within the network, we’ll get into privacy in a minute, but it’s potentially a concern as well. Also, even if we have this algorithm that tells us which nodes are rebalancing, which channels are rebalancing all the time, we’re going to strongly prefer those channels, what about these nodes, and there are many of them, that accept a lot of payments, receive a lot of volume, are large nodes, and accept channels from everyone, so they have a thousand channels, and every single one of those channels is always completely saturated to them, how in the hell do you pay these people? So that’s also an issue, you can’t rebalance cheaply or easily through, if you’re one of the peers of this node that regularly gets almost all of its capacity pushed in one direction, it becomes hard to rebalance. You can just constantly open new channels, that’s one strategy, and in fact that’s a common strategy, but certainly when a node that just started up tries to pay you, they’re going to have a really hard time finding which of your thousand channels actually have any capacity available, and so this strategy of just like, put up a big node, Plebnet users will all open channels to you, it’ll work itself out, you’ll get plenty of inbound liquidity, it’s not actually all that useful, in fact you see many of these larger recipients no longer accepting channels from everyone, starting to have known peer lists, where they accept channels only from people in a list of nodes they keep, etc. So, okay, you just only accept channels from good nodes, I don’t know how you decide where a good node is, but fine, you do that.

Inbound Liquidity is Still Hard

So inbound liquidity is still hard, nodes still need to be able to get inbound liquidity from somewhere in order to receive payments, there are many people working on this, of all of the problems in any of these slides, inbound liquidity is probably the one with the most solutions and the most people working on solutions, and the most different marketplaces and whatnot that have been built, great, maybe we don’t need to spend too much time on that, there’s still many built in, now it’s not necessarily easy, none of these things are clearly a winner, who knows. But yeah, routing, routing is still hard, I guess I should also mention that the actual solution to routing is you just open a channel with whoever you’re trying to pay, if you’re a large node, that’s basically what you do, you’re a large node with lots of payment volume, you just open a channel with all of your top destinations, and then it’s no problem, because your top destinations are probably everybody’s top destinations, and they have a thousand channels and they’re all saturated, so you normally can’t pay them, and no one can pay them, but you can pay them if you just open a channel. So Lightning is, you always only have two hops, maybe three hops in the worst case, everyone opens a channel with everyone, and Lightning works great, because it totally scales, n squared, I’ve heard, scales super great, if everyone has a direct channel with everybody else, we’ll never have any problems. Definitely also don’t have to overcommit liquidity there, but that’s what people do today, because that’s what works, and actually fixing this problem is hard, but for now we can work around it.

Lightning Is Private… Right?

Routing Nodes Announce UTXOs

Alright, so I mentioned privacy, right, I’ve mentioned privacy a few times, about how you kind of want these long paths that meander through the network, so that hopefully nodes don’t learn who you are, so let me set a little bit of context before I go into privacy, as you probably know, or hopefully know, Chain Analysis, all these other folks, the elliptic, if I can speak, they have all of these great datasets for UTXOs, and who potentially owns which wallets, and which wallets are associated with who else, and which UTXOs are clustered into wallets, they’re actually really good at this game, it’s a hard game to defend against, they’re better on it on other chains than Bitcoin, because other chains are horrendous for privacy, but Bitcoin also has its problems, and UTXO clustering is painful, but they have really great data for that. So their primary motivation right now is they want to connect their existing dataset to Lightning, so they want to be able to say this Lightning node is this cluster of UTXOs that I’ve already identified as a wallet, and that wallet has been receiving payments from Coinbase or Kraken or whatever, and I can go ask them who it is, and then I will identify who owns this Lightning node, because I know they’re UTXOs. So that’s our biggest risk, our most immediate risk, because that’s what they’re doing, and what Lightning does is Lightning routing nodes just tell the entire world which UTXOs are theirs, and tell Chain Analysis directly, hey guys, I’m this routing node, and here are my UTXOs, so that’s totally private, definitely the thing we want to be doing right now. All right. Yeah, so that’s a problem, maybe we should fix that. So why do we do this? DOS resistance is important in Lightning. We don’t want everyone to just be able to come online and say, like, hi, I have a channel with everyone, route payments through me, I charge zero fees, so route all your payments through me, and I can see everything that’s happening in Lightning. That would be bad for privacy in other ways. So we need some kind of denial of service resistance, we need to be able to say, hey, I’m going to prove to you that I have some Bitcoin on Chain, I have a UTXO, it has some Bitcoin in it, at least this has a cost, so it adds a cost to denial of service attacking the network, so we use UTXOs for that. So that’s why it’s so broken right now. So how do we fix this? Maybe we use zero-knowledge proof, hand-wave, hand-wave, we prove that I have a UTXO, but I’m not going to tell you which one it is, that would be really great. We could, in fact, do this, it’s not so hand-wave, but we spent a good chunk of time looking for a good ZKP system that we just wanted to use off the shelf, and didn’t really find one that was super mature, and going to be super mature for the next few years. Hopefully this is something that changes in the coming years, so we can actually switch to a ZKP scheme for denial of service resistance here. Which, okay, fine. But maybe in the meantime we just prove balance, we just show, like, hey, I have some balance, I’m not going to tell you all of my UTXOs, I’m not going to assign a UTXO to every channel, we just prove some balance, and maybe that’s enough denial of service resistance. Clustering will probably still eat your lunch, so they, you know, chain analysis, etc., are going to cluster UTXOs into individual wallets, they’ll probably still win, because they will be able to cluster the UTXOs you used to prove with your other funds, because it’s probably commingled a little bit, so avoiding commingling funds is really hard. People continually break this, probably maybe how the Bitfinex people got thrown in prison, they commingled funds too much, threw a bug in a wallet, it wasn’t even their fault, they were really careful, there was a bug, their funds got commingled, they went to prison. So, maybe routing nodes are screwed, but maybe we can improve that in the future. What about private nodes? Well, you still have UTXOs with the public node, that public node, eventually that channel is going to get closed, those funds are going to get commingled with the rest of their funds, and again, clustering is going to eat your lunch. So the attacker, chain analysis in this case, is going to learn, hey, these funds, it was a channel that they closed with someone, the channel wasn’t announced, so it was clearly a private channel, I got commingled with the rest of their funds, so I know it was theirs, I can go ask them, hey, there was this channel that you had open, what was the IP address of your peer? Even though it was a private channel, I can probably learn actually who it was by knocking on their door with a subpoena, and I’m sure they’ll let me know. Alright, so this is bad.

Receiver Has No Privacy

Okay, so maybe let’s ignore the UTXO part for a minute. That’s complicated, and maybe it’s fine. So what about in-lightning? Do we have privacy in-lightning? So you generate an invoice, you put your public key on there, you sign it, and you hand it to someone, and suddenly they know which node you are, and which UTXO is yours, and…oh shit. Okay, so receiver has no privacy, that’s not good. But we do have a solution for this one, right? Blinded paths. We can in fact do something like, if you’re familiar with Onion Services and Tor, or Hidden Services and Tor, what you actually reveal to people is you say, hey, here’s an introduction point, it’s a few hops away from me, but if you talk to them, they’ll know how to get to me. And you tell people that, and they talk to that introduction point, and then it goes through a few more hops, and then it gets to you, and so no one in theory actually learns who the recipient is. It’s great, we can do something exactly like that in-lightning. We can create these paths, we can actually, we probably don’t want to do introduction points themselves, but we can hand the entire path over to the sender. Turns out that’s a lot of data, we can’t fit it in QR code. Oh man. Okay, so we have this bunch of data, we’ve got to communicate it to somebody. Ah, we have Onion Messages. So we can take these blinded paths, we can actually do short blinded paths, and do that in the QR code, but we can’t include more than one, maybe two blinded paths in the QR code, because there’s not enough space. So we can use Onion Messages, then you can ask for more blinded paths, and then I’ll give you more blinded paths, and then you can retry the payment a few times, and eventually find a path that has enough liquidity. Great, problem solved. Bolt 12, Onion Messages, we’ll get there eventually. As long as the nodes in the Onion Message path aren’t down, and if they’re offline, then you just can’t pay. Oh well. Okay, so, alright, so maybe we can fix receiver privacy.

Everyone (actually) Knows the Sender + Recipient

So what about routing nodes? Turns out, if you actually sit down and work on it, there’s a number of good papers on this, a routing node can make a very, very good guess, almost not a guess, as to who the sender and who the recipient of the given payment you’re seeing is. And there’s a few reasons for this. Turns out, in Lightning today, like I said, everybody opens a channel with everyone, so there’s only actually one or two hops in most paths, so you could make a pretty good guess of who the sender and the recipient are if you’re the only hop in the path, because you are connected to the sender and the recipient. Not so complicated. You can also make good guesses as to how many paths they went through, so there’s the CLTV, those actual time locks in the HTLC as it’s routing through the network, you can make, so we don’t want to increase those time locks too much, because if we increase those time locks too much, you might have to wait a very long time before the HTLC times out, so if there’s a stuck payment, or someone’s doing channel jamming, you make the channel jamming attack worse. So, if we can’t increase those too much, though, then all of a sudden the intermediate node can look at the HTLC and immediately say, oh, this HTLC has two more hops to the destination, and then if you look at the routing graph, you can say, ah, I know who I’m connected to, I know who’s two hops away from me, I know where this came from, I’m actually going to sit down and run common routing algorithms across all sources on the network and all destinations on the network, and it turns out there’s generally not that many routes that are two more hops from you, that have more than one or two different potential sources of this HTLC, and again, you can make a really good guess as to who the source and the destination are. So this is really bad. So you can see, you can put yourself in a good position where you can, for example, learn how much payment volume your competitor is getting, and see their numbers before they even do. So that’s obviously pretty bad for a payment rail. If you can see your competitor’s numbers. So, yeah, it turns out this is really hard, and there’s some ideas, there’s some potential solutions. We can randomize the CLTV a little bit, hopefully that improves things somewhat. Some nodes do this, some nodes don’t. Maybe all nodes do, I hope all nodes do, I don’t know. But routing is also a little deterministic right now, so the routing algorithms are a little too easy to learn which potential paths nodes might be taking, especially if you have any guess as to which other nodes have unreliable channels, because then you know the nodes definitely won’t be taking those paths. So we could randomize the routing algorithms a little more. This is like an open academic problem, it’s like how do we build a routing algorithm that considers privacy a little bit? It’s also not an obvious question, but something we can improve. I mean, just adding more hops to payments by default might improve things, but we have to balance that, of course, against success rates and users who want payments to complete quickly. So it’s not clear exactly how you do this, but maybe we can make some progress.

Fast Payment-Timing Correlation Tradeoff

Yeah, so it’s also important to note that, you know, Lightning has this great thing in general in networks like Lightning, like Tor, these low latency forwarding networks. What you’re really trying to do is you’re trying to hide the noise. If you’re a routing node and you see, you know, 100 HTLCs at once, you might have a little less idea of which HTLCs are correlated with what other payment flows. So you might not know that, like, for example, if someone’s making a payment once a week from their node to a given destination, if you monitor your Lightning node over the course of a few months, you might be able to eventually identify exactly that there is someone making this payment once a week from this source to this destination. But if there’s a lot of noise, if there’s a whole lot of other payments, maybe you can hide in the noise a little better. This only works so well if payments are moving very quickly, if there’s only so many HTLCs in a channel at once. If, for example, you run two nodes, you correlate two different HTLCs because there was only actually one HTLC you saw on those nodes. Even with PTLCs, let’s assume PTLCs are HTLCs, you can see that obviously, because they share the same payment hash. But, you know, if you’re running two nodes, you might not learn that they’re the same payment, but only if there’s a lot of other payments going on at the same time. So this one actually has a really easy solution. More people just need to use Lightning. More noise, more noise is good for privacy. More people need to use Lightning. Tell your friends to use Lightning so that you get some privacy. Easy.

Balance Probing Sucks

One last note on privacy. It turns out, balance probing today is really easy. So without even probing, literally just from monitoring their successes and failures, at some point I downloaded or I got access to Cash Apps. Obviously, Cash Apps sends a good chunk of payments. I got access to Cash Apps data on their predictions for the balance range of one of my channels, and it was accurate to within a few hundred sats. So they had an almost exact understanding of what the balance in my channel was at that given time to very, very high accuracy. And if you can do this en masse, which in fact you can probably, then you can watch payments flow through the network. If you know that these three channels suddenly had their balance shift in one direction by one million sats, then you can make a pretty damn good guess that one million sats payment just flowed through those three channels. So this is really bad. It turns out there is some easy things we can do now to improve this. There’s probably more things we need to seriously consider later. There’s this idea called the Oakland Protocol, just because it was created in Oakland and engineers are bad at naming things. So if you’re operating a lightning node, there should be a setting, hopefully there’s a setting for the max HTLC in flight. I mentioned it earlier because if you reduce this number, it means you have lower risk. If you go offline and you have to use your watchtower, you should reduce this number because it also improves everyone else’s privacy when routing through you because you can hide what the actual channel’s balance is because users are only able to probe up to the max HTLC value in flight. If the max HTLC value in flight is less than your channel’s current balance, then users aren’t able to probe you at least as easily. There’s some other ways they can probe, but at least it makes it harder. So you should change that number if your lightning implementation does not default to less than half the channel value. You should map your lightning implementation to change the default to less than half the value. If you’re a node sending payments, you should prefer to route through nodes that are setting this value to less than half the channel value because it improves your privacy. CashApp does this. It’s only a very small amount, but it does this. So you might make more routing revenue because some nodes prefer to route through your channel if you set this value to less than half your channel value. So do that.

Channel Jamming

So I’m out of time. I’ve been out of time for a while, but there’s also a slide on channel jamming, and I don’t know. I got nothing. I’ll quickly run through this. So there’s basically two directions for channel jamming. As you know, channel jamming is just sending a bunch of HTLCs, sitting on them, not letting them expire, and causing the entire network to no longer be able to route payments because there are just pending HTLCs everywhere, and we can only have so many pending HTLCs in the channel, and suddenly no one else can get a payment through. There’s basically two directions to addressing this. It’s more recently got a bunch of research. Either up-front payments, so users have to pay money up-front to route a payment before they even learn whether it succeeds or fails. So this has the problem of, is the fee going to be enough to compensate, to increase the cost of the attack so that attackers can’t do this, and are payment success rates enough that users aren’t going to get pissed off because their payment failed and they still had to pay the fee, just like Ethereum. So the other approach that people are considering is reputation systems. So maybe you pay up-front payments sometimes, but if you’ve been doing a lot of payments, you can get some blinded token, and you can use that to say, hey, I’m someone who’s been doing a lot of payments, you can trust me, and then Nodes trust you. There’s also, if an HTLC has been waiting too long and it hasn’t been expired, that you prove that a channel went to chain, you show that you actually spent some on chain fees as penance, and then your reputation isn’t burned. There’s a number of different ideas, there’s a number of research papers trying to cost out how bad the user experience risk is, how high the fees are, how much it prevents the attack. So need more research. There’s been a ton of great research on this very recently, so hopefully things are moving in a very good direction on this front. Thank you. I am an optimist, I swear to God. I still love Lightning, but let’s be honest. Thank you. Thank you.

Q&A

Q&A moderator: Let’s do a Q&A real quick.

1st person: Okay, I have a quick question for you, as someone in the Lightning industry. When can we open up channels with people we don’t necessarily know, and they have cool, funny-sounding node aliases, and it’s like, oh, that guy has a lot of channels and stuff, maybe that guy’s cool. Is there going to be a point in the near future where it’s like, hey, I can feel a little bit more comfortable opening up with randoms that seem cool, like Harambee, maybe, or something?

Matt: Let me pull up this. Yeah, I mean, the first three issues I think we have good ideas. Again, the first one we’ve got code for today, in most Lightning implementations it probably kind of works. The second two issues, there’s changes coming on the pipeline for Bitcoin Core. We’re going to get those hopefully soonish, you know, the next year or whatever. So that’s also very tractable. I think we can be optimistic about those three issues. And that’s kind of the immediate risk from someone who’s trying to be obnoxious and do some of these pinning attacks and whatnot on a small scale. So I’m optimistic about that in like the year and a half-ish time frame. Maybe more, who knows. Antoine’s being pessimistic over there. A few years, whatever. Three years? All right, three years. The large scale one is concerning that doesn’t have a clear solution. Hopefully it’s something that is mitigated by analyzing and seeing whether someone is opening way too many channels before it happens. Unclear how practical that is. But it’s not even an attack against you at that point, it’s just with everyone. So yeah, I mean, the large scale closure thing is really concerning. I don’t have a good answer to that. Next year what? Oh, next year talk. We’ll talk about that. Yeah, we’ll come up with an idea.

Q&A moderator: The solution to all the lightning problems from 2022. Did anyone have a question about the well-crafted sarcasm that Matt had during his presentation? Was anyone like, “oh, was he being serious at that point? Ah, actually that’s concerning or…” The N-squared one was good, I like that. Oh, sorry.

2nd person: Not really a question, but now to fiat.

Matt: Fiat is easy. You can just reverse all the payments and the merchant gets screwed, but the consumer is protected, it’s fine. Well, unless it’s the new, what’s the new one? The Zelle, unless it’s Zelle, in which case you get screwed anyway because the banks don’t want to reverse your payment. Good luck.

3rd person: In the mass closing case, does reducing your HTLC exposure help you? Or is that some other… What’s your personal risk of mass closure again?

Matt: That if the attacker has one of the channels with you, that you need to time out the HTLC before they can claim it on some other… It’s a race, right? They also have to get their transactions confirmed.

3rd person: So this wraps back to minimizing HTLC exposure, right?

Matt: No, it’s also because they can broadcast a stale state, and then if they wait, if they manage to delay the full day, they risk a bunch more funds that way, but if they manage to delay the full day, then they could theoretically take the funds at that point.

Antoine: The main issue is, my assuming is, you do have a maximum amount of HTLC in your reserve. This is my team currently, and people are waiting for getting into the block. So if the main pullback block is bigger than your HTLC time lock, you’re not going to be in the number of HTLC you’re inflating. The max of your feedback in reserve are not enough to get into this input backlog. At some point, it means your HTLC are not going to get into the block before expiration of the time locks. So yeah, you have those three, so if you’re looking at reserve, the number of HTLC, and the main pullback block.

3rd person: Can you summarize that for me, please?

Matt: So one issue Antoine pointed out is that when you are doing these kinds of anchor spans, this funding, this input right here, this existing input funding, is some reserve you keep on chains. You have an on-chain wallet with some extra funds to do these fee bumps. So you only have a certain amount of funds there. You want to keep most of your funds in Lightning where you can, but you keep some amount of funds in your fee bumping reserve. How much? You’re only going to have some amount, and if the mempool gets very congested, you’re going to potentially run out if the fees go up, but then you won’t be able to get your transaction confirmed quick enough. Or if the HTLC is not even worth spending that much fee, then you might just write them off, and then you’ve lost money one way or another. Is that close?

4th person: Amazing talk. Do you think that attackers stealing funds from zero-conf channels, and the popularization of zero-conf channels is legitimate threat?

Matt: Zero-conf channels are almost, and I hope exclusively, used by wallets where you’re connecting to an LSP operated by your wallet vendor. So I think it’s kind of outside the issues here, because it’s really just always your wallet vendor, and if your wallet vendor really wanted to steal money from you, they really could. They can just ship an update, and your phone will automatically pull the new app and just send them your private keys. So I’m not super concerned about it. There’s a lot of regulatory questions about it. With a wallet, you don’t want to have very large risk. Ideally, people can run open source software, not on mobile phones. There’s a big challenge with open source software on mobile phones, because you’re just downloading a binary from the app store, and you don’t actually know whether it matches the source code you’re running. So there’s a number of, I think, more immediate challenges that make that a less interesting concern. And it’s hard to pass up the UX. It’s just users want that instant payment feel, even on first start, especially on first startup. So it’s a trade-off.

5th person: I had a quick question on Taro and basically stable coins and newly minted coins operating on top of Lightning. What your overall opinion is on that? Do you see that potentially leading to it clogging or congesting the Lightning network for its primary purpose of just simply routing sets? Or do you think it will start on the peripherals and just see how it plays out, what the users prioritize? Just your overall opinion.

Matt: Yeah, I mean, I have no idea how it’s going to play out. I don’t think anyone has any idea how it’s going to play out. I don’t see why it would cause issues within the Lightning network. At least if it’s deployed as currently described, it may be easier to deploy as a parallel network. But if it’s deployed as currently described, where you really have a channel with a market maker, and then they turned it into Bitcoin, and then it gets forwarded through, I don’t really see why this would be a concern to existing Lightning routing nodes or anything. You’re not really necessarily going to be exposed to much. Most people are trying to take advantage of the option implied in that, which means you’re going to see a lot more stuck HTLCs for a while. But yeah, I don’t think I’m really concerned about it from that perspective.

Q&A moderator: I want to ask one last question. I think there’s no other questions. Last chance. Second to last question, then. You mentioned Cash App earlier. Do you see any kind of meet space agreements that are going to maybe get popular as a stand-in, or even as a long-term thing with Lightning partners, or channel partners? Or do you think that’s kind of against the ethos of the space? I’m just curious.

Matt: In the ethos of the space or not, there is an implied, if you know who your counterparty is, and they broadcast a stale transaction, and do a pinning attack against you, I think you’d have a very, very compelling case in front of a judge. I’m not a lawyer, but judges are humans, and they see the reality of, they’re not technical, they’re not going to sit there and parse out some technical bullshit argument about how it’s just in the system, and it’s in the rules of the system, and it’s fine. They see reality in front of them. They’re reasonable people, generally. So whether you have an agreement or not, I would imagine you probably don’t want to perform some of these attacks if people know who you are, because you might very well lose a lawsuit.

6th person: So as far as gossip with channel, a lot of those bandwidth, with redundant messages and stuff, I believe there’s research in Minisketch. Are there any other solutions for cleaning up constant relay and messing?

Matt: Is Alex here? There’s a Core Lightning guy who’s working on this. I don’t know if he’s here. Oh man, I’m going to give him shit for missing my talk. Yeah, he’s done some research on, some of the Core Lightning folks have done some research on using Minisketch in lightning gossip. There was some debate as to, there were a few different approaches we could take. We had a little bit of back and forth on it. I don’t know where they landed, but probably that’s something that will happen eventually, to rewrite gossip to use something like Minisketch. And hopefully that reduces the bandwidth cost a lot.

Q&A moderator: Thank you so much, Matt.