Trampoline Routing
Speakers: Bastien Teinturier
Date: October 20, 2019
Transcript By: Michael Folkson
Tags: Trampoline payments, Amp
Media: https://www.youtube.com/watch?v=1E-KhLA6Gck
The Lightning Conference Day 2 (Stage 2)
Slides: https://docs.google.com/presentation/d/1bFu33_YFsRVUSEN-Vff5-wrxMabOFl2ck7ZTiVsp5ds/
https://twitter.com/kanzure/status/1221100865328685057
Intro
We will start with gory technical details about routing and pathfinding. I hope I will be able to keep you awake for the next twenty minutes. I’m working for ACINQ on the Lightning Network. If you don’t know ACINQ we are contributors to the specification. We run the biggest node on the network and we are also one of the leading mobile wallets, eclair-mobile. We also just teased a prototype of our next wallet Phoenix on Twitter. If you haven’t seen that yet check out our Twitter account. What I’m going to be talking about for the next twenty minutes is how we plan on improving the mobile app experience with trampoline routing. I can’t talk as fast as Laolu and we only have twenty minutes so I will stay at a high level overview. If you have questions or want more details there are links at the end to the full specification. You can talk to me anytime.
The Lightning Network, a Payment Channels Network
So first of all what is the Lightning Network? I’m hoping you already know that so that is going to be very quick. The Lightning Network is just a Layer 2 solution, a payment channel network meant to help scale Bitcoin payments. You just create a 2-of-2 multisig on the Bitcoin blockchain and then you are able to update balances of your channel completely offchain by just sending peer to peer messages between people in the channel. The only important point in that slide is that there is network in the title. That’s really important because if you had to open a channel to everyone you wanted to pay that would be quite limiting. What we can do is just route through other people’s channels in exchange for fees. This way you can get anyone you can find a route to in the network graph.
Payment Routing Features
So payment routing today in Lightning offers quite a lot of interesting features. I want to highlight them briefly. First of all right now we are using source routing. This is nice. This means that the sender decides on the whole route that the payment will take. This gives a lot of flexibility to the sender because by having full control on the route that we take, the sender is able to optimize for fees if he wants to pay the least fees by just running a pathfinding algorithm to minimize for fees. Or he can choose to optimize for privacy by using a longer route and paying more fees. Overall it gives lots of flexibility to the sender. Another interesting and important property of payments in the Lightning Network is that they are trustless. They use HTLCs for that. Udi did a great job yesterday at not explaining how HTLCs work. I’ll do exactly the same. I don’t have a fancy meme but if you want to know how HTLCs work it is going to be for another talk. But it allows us to route in a trustless way with multihop payments. A very important property of payment in the Lightning Network is that they are private. We are using onion encryption for that based on a crypto scheme named Sphinx that I think was released in 2009. This allows us to make sure that people that are intermediaries in our payment route don’t know where they are in the payment route. They don’t know if they are the first one, the second one, the third one and they don’t know who the sender or the recipient is. It gives you a bit of anonymity when you’re writing your payments. You are not telling the whole network who you are paying and how much you are paying. That is something we definitely want to keep. All of those come at a cost for the payer. You need to sync a big network graph which means it costs bandwidth. You then need to store that which costs storage on your small mobile device. Then you need to run a pathfinding algorithm to find routes to pay whoever you want to pay. That costs a lot of CPU which drains your battery. It is also horrible for the UX experience because you launch the app, it has to sync all the network updates that it missed to be able to find that payment route so it is going to be really bad. We want to improve that. That brings me to the cool slide. The one with the potential impossibility result. The Lightning payment triangle of success.
Lightning Payment Triangle of Success
Basically I think that in peer to peer decentralized routing networks we can only achieve two of the three properties of this triangle: privacy, route cost optimum (which means you optimize your fee and pay the least fee that you can pay for that payment) and performance. Right now we are at the left edge here where we have good privacy and good route cost optimum. This means that we can route payments privately by paying just a small amount of fees but at the cost of performance. Note that we are not exactly at the center of this edge and we can decide to be anywhere on this edge right now because you can choose to optimize for fees by using smaller routes or you can choose to use longer routes and pay more fees for more anonymity. Of course we don’t want to be on the edge of the bottom because we don’t want to sacrifice privacy for reasons. What I’d like to focus on is the edge on the right side. The one where we give away some of the route cost optimum properties for better performance. We keep privacy but we’re ready to pay a bit more fees to have a better experience especially for mobile devices. First of all, let me explain what led us to this need. As I said before we currently operate both the biggest node on the network on the server side and one of the leading mobile wallets. That allows us to get a good real view of what works and what doesn’t work for Lightning today. Our server node has been running for more than a year without any issues so on the server side everything works fine. But we have started to see the mobile apps’ experience degrade a bit because of the network growth and because of more updates to sync and a bigger graph to run pathfinding on. We will need to address that and do something about it.
Let a billion payment channels bloom
When you are running a mobile node on your phone of course you are offline most of the time which means you are missing a lot of information. When you are taking your phone out of your pocket and you want to pay for something, you have to do a lot of things before that. You have to first connect to a peer, sync all the updates that you have missed about what happened in the network to get a good enough view of the graph to be able to find a route. This consumes a lot of bandwidth. You could be on a mobile plan and you don’t want to consume that much bandwidth. And it takes time. That means you have a spinner running and you can’t just pay, you have to wait for that to finish syncing. Then once you are done you also need to run a pathfinding algorithm to find the route to your destination. If it is some guy at the other side of the graph, this is going to take time as well, consume a lot of CPU, drain your battery and you’re still not paying. You’re still seeing a sliding thing saying you have to wait. That’s bad. We want to do better than that. When we think about mobile payment apps we think of things like Apple Pay, Venmo or WeChat. What we want to do is take our phone out of our pocket, scan a QR code and voila the payment is sent instantly. If we want to get mainstream adoption of Lightning that’s who our competitors are. Our UX should be no different, we should have the same kind of UX and it should not be that painful for users to just pay something with the Lightning Network. The easy way to just improve performance is to stop syncing the whole graph. If you don’t sync the whole graph then that means you don’t have to consume bandwidth to sync things that are far away from you. This means that you have a much lighter pathfinding algorithm to run because the network graph is going to be a lot smaller. Overall it is going to improve performance a lot. Of course if you don’t sync the whole network how do you find a route to your destination? If you don’t know how to reach the guy then you can’t pay either. You have something that is faster but you can’t pay. The idea is to use what we call trampoline routing.
Trampoline Routing
You are going to simply defer building parts of the route to some node in the network. Not the whole route because that would be the same thing as a custodial wallet but only parts of the route. You’re going to keep the same privacy properties that you care about. The idea is that you are only going to sync the part of the network that is close to you. For example your two radius neighborhood. If we have a node in the center here in light blue we’re going to sync nodes that are a distance of two from us. Or we could change three or four depending on what bandwidth we have available on our phone and how often we are online. Let’s choose two. That means we have a much smaller graph to sync and analyze when we’re trying to pay. We’re also going to be syncing some trampoline nodes from the network. That could be remote, that could be in your local neighborhood. We are going to sample them probably randomly. Those are the nodes in dark blue here. When you want to make a payment you are going to first choose a trampoline route. That means you are going to choose a few trampoline nodes that you are going to use to reach your destination. For example, hear node 1 and node 2. Let’s imagine that our destination is that node in grey. We are choosing node 1 and node 2 and committing to using that route. We make it so that the first trampoline node is in our neighborhood. That way we can find a normal payment route to the first trampoline node and send him what looks like a normal payment. When it reaches that trampoline node he is going to discover instructions to route to node 2. He is going to run the pathfinding algorithm for us to find the route to node 2 and to forward our payment to node 2. When node 2 receives the payment he will peel the onion and discover instructions to route to that node in grey. In fact he will run the pathfinding algorithm to find the route to that node. It repeats. You can use as many trampoline nodes as you want, there is a limit of course. You repeat until you reach the destination.
A tale of onions
The thing that makes that possible is a change we made to the protocol recently which is a variable length onion and I will detail that in that slide. Before that a few months ago what we had in the onion construction was 1300 bytes onion divided in 20 frames of 65 bytes. That meant we had 65 bytes per intermediate node in the network and we couldn’t put much information in there but we could make 20 hops through the network. Now we’ve changed that so we can put as many bytes as we want for each intermediate node. What we do with that is we are putting in the payload, the big onion payload, a smaller onion payload which is in orange here. Here Alice wants to pay Bob. Alice is choosing trampoline node 1 and 2 to pay Bob. Alice is first creating a small onion in orange. You need to start reading it from the bottom. It says “Forward to node_id bob a payment with that expiry and that amount (1000 sats).” She encrypts that for node 2. Then she adds on top of that instructions for node 1 to route to node 2 saying “You should route to node_id t2 with that expiry and that amount.” That makes an onion that can only be decrypted by node 1. Then Alice is going to run a pathfinding algorithm but on a much smaller graph to find a path to node 1. She is going to put that orange onion in a normal onion that says to the first grey node “You should forward 1045 sats through that channel_id 42.” What is interesting here is that the first node in grey is just receiving a normal onion. He doesn’t know that trampoline is used, he doesn’t have to be updated to understand trampoline. It is just a legacy payment and this can be a legacy node. But when it reaches node 1 he is going to be able to decrypt the top level onion and verify that he was supposed to receive 1045 sats. Then he finds that there is a smaller onion in there that he can also decrypt. Decrypting that small onion gives him information about what he needs to send to trampoline node 2. Now he runs the pathfinding algorithm and does the same. He puts that small onion in a bigger onion and sends that to node 2. It is going to repeat at node 2 until it reaches the final destination. This is effectively trading fees for performance. Alice here didn’t have to run anything expensive. She had to run a pathfinding algorithm on a graph that has a depth of only 3. This is really simple, she can completely brute force that and it is still going to run in less than 100 milliseconds. It is node 1 that is computing a route and node 2 that is computing a route. From that they take a bit more fees than what they would have otherwise taken with normal relay. There are a lot of features that kind of come for free once we have trampoline that I think are interesting. For example, AMP.
AMP and Trampoline
Right now when we are doing AMP in the network with normal payments the sender decides on how to split the payments and every intermediate node just blindly forwards those shares. That means the sender has to do an even harder job of finding routes but also finding a correct way to split a payment into the network. If you are a mobile node that means you will have to make something that is even more expensive than what we have to do today to be able to send an AMP payment. But with trampoline we can do something that is much smarter than that because we can aggregate multipath payments at each of the trampoline nodes and then decide how to resplit to reach the next destination. For example, if Alice on the left wants to send 0.5 Bitcoin to a destination and she only has smaller channels, I ignored the fees on this slide because it is easier. She is going to just decide that she is going to use Bob and another guy as a trampoline node and she is only going to care about how to send that 0.5 Bitcoin to the first trampoline node. Since she has a much smaller graph of only depth 3 she can very easily find an optimal split to reach that guy. She is splitting it into 3 shares here for example. That trampoline node is going to wait to receive all the shares and once he has everything and has the destination he knows that he has a better way to reach that guy because he has quite big channels. He can split it into 2 paths instead of 3. Splitting into more paths also costs more fees. If you are able to send a payment via only one path it is always better. He is able to relay to the second node with only 2 splits and the second node has a big channel to the destination so he doesn’t have to split at all. This is a lot more efficient because each node has more information about others, about his local surroundings because he knows the balances of his own channels. Whereas Alice wouldn’t know the balances of Bob’s channels so would not know that it is possible to use this channel or this channel because the balance allows it. So trampoline allows for much better AMP than the normal AMP that we have today. It is really easy to build on top of trampoline in the existing code. There are also other features that we can do more easily with trampoline like rendez-vous routing. I haven’t detailed that yet but I will send a mail to the mailing list soon about that. What is interesting about that change is that at first it looks like a very big and scary change. We are basically moving from self routing to kind of packet switching so it looks really complex and hard to add to the codebase but in fact it is not. Because we are reusing the onion construction but just with smaller onions it is an incremental change that we add on top of the existing payment infrastructure. It is quite easy to develop and add to our codebase. It doesn’t change anything in the way we route HTLCs. HTLCs just work independently of how we do the routing underneath. That is important. If you have seen the Phoenix demo, that is what allows Phoenix to send payments instantly and not have a waiting bar when you launch the app. We have currently a prototype of that deployed and we’re working on finishing and making it production ready. We are going to deploy that to our node. You are going to be able to use that soonish. An important point is that at some point maybe the network will grow to be huge and we cannot ask any routing node to be able to compute a route to anyone else on the network because the network is going to be too big. If that happens, I’m not saying it will but if that happens we are going to have to somehow divide the network into zones. When we do that having a kind of packet switching like trampoline makes it really easy to move to something like that. I’m not saying we are going to have to do it some day or even in the far future but at least we are taking an incremental step towards that direction to be able to do it if we need it some day.
To Infinity and Beyond
That is all I had. This is a high level view. Of course there are a lot of gory details that are not completely fleshed out yet and we would love to have more feedback on the proposal if you have ideas. There are parts of it that we are not completely happy with, especially when dealing with legacy recipients that do not support trampoline. If you want to look at the proposal and add your ideas or contribute we would welcome that. There is currently a spec PR that we will update soon because some small things have changed. I sent a mail to the mailing list a few months ago to detail the high level view of how we think it is going to work. Have a look at that and don’t hesitate to put comments in there or just come and reach me to ask any questions. I think we have two minutes for questions.
Q & A
Q - My question is about privacy in trampoline routing. How do you prevent let’s say the NSA running a large fraction of all trampoline nodes?
A - Because running a trampoline node is not that expensive. Right now when you are running a routing node your CPU is mostly idle because the only thing you are doing is that you are just taking an incoming packet, decrypting an onion and it already says what channel you need to route to. If you are not a really nice routing channel you don’t even have to keep the network graph. You can ignore that and just decrypt forward. This doesn’t use the CPU at all. What is good with trampoline is that it incentivizes routing nodes to keep the network graph and also make sure that it is possible to reach any node in the network. I think that since right now we are not using your CPU, this is lost capex, you should use that. If you can collect more fees by using your CPU to run pathfinding for people it is even better. I think that every serious routing node that is running on a server machine and not a Raspberry Pi and not some crappy hardware, every real routing node has an incentive to run trampoline because it is a good way to earn more fees while providing useful services for the whole network. I am expecting that there are enough people to counterbalance what the NSA would run. Even if the NSA runs most of the trampoline nodes it is the same as today. They would need to have all the trampoline nodes in the route because since we are using onion encryption for the trampoline onion as well, a trampoline node here doesn’t know where he is on the trampoline route. It is a smaller route than the 20 hop route that we can have in the whole payment, I think it is 5 hops max but still you don’t know where you are in that.
Q - Does the last trampoline node discover who the recipient is?
A - If the recipient supports trampoline and knows how to decrypt this packet, no. The last sender just sees a normal onion packet and sees that he is supposed to forward it to that next guy. If the recipient does not support trampoline then yes you are losing privacy. But it is kind of the same as today. If the next to last node in the route is forwarding to an unannounced channel for example, or someone he knows is not routing, he knows that this is the destination. We have the same kind of trade-offs, a bit worse when the recipient is a legacy node. If the recipient supports trampoline we have ways to make it as good as today. That is an area where we would like to see more ideas and more ways of analyzing that we’re not losing any privacy.
Q - If you are giving up custody of your route to a third party. Let’s say every Sunday I go to church and I make a donation to my church. If I give that pathfinding to a third party then they can route through somebody that will use that information?
A - It depends because you still choose your trampoline node. It is just that you are choosing less of the route. You are not choosing the full route but you are choosing trampoline nodes and you can rotate them randomly. Also it is not replacing the existing routing algorithm. You still have the choice to use either of those. If you want to optimize for fees but have a slower boot up time before payment you can still do that. It is when you want to optimize for performance that you would use trampoline routing. If you are doing something that you think may be subject to analysis then you should use eclair-mobile instead of Phoenix and compute the whole path for yourself. It is just about giving more choice. On that triangle we currently only have the edge on the left and we are just giving you the opportunity to use the edge on the right. We are not forcing people to do that. You still have the opportunity of doing both of those depending on what your need is.