Lightning Network Deep Dive
Speakers: Olaoluwa Osuntokun
Date: July 18, 2016
Transcript By: Michael Folkson
Tags: Lightning
Media: https://www.youtube.com/watch?v=tLZc-NLmV20
Slides: https://docs.google.com/presentation/d/1OPijcjzadKkbxvGU6VcVnAeSD_DKb7gjnevBcqhPCW0/
https://twitter.com/kanzure/status/1052238088913788928
Introduction
My name is Laolu Osuntokun and I work with Lightning guys and you know me as maybe Roasbeef from the internet, on GitHub, Twitter, everything else. I’ve been Roasbeef for the past 8 years and I’m still Roasbeef that’s how it works. I’m going to give you a deep dive into the lightning network. I remember like 2 years ago when I was learning about Bitcoin initially and I went to these Bitcoin dev meetups in Mission and there was a deep dive and Taariq was the MC there, super friendly guy. I’m happy to give back to the community now with my own talk and full circle at this point.
Talk Outline
Quick outline of the talk. First, I’ll give a high level review of Lightning network. I won’t be going Lightning 101 because this is meant to be a deep dive talk, more into the core technology. Then I’ll be going over some updates we’ve made to our scripts and revocation protocol within it and then I’ll be going over something I call LCP or the Lightning Commitment Protocol which is exactly how two nodes update the commitment transaction and you basically have HTLCs flying back and forth, super fast. I’ll talk about routing for a little bit. I recently collaborated with BitFury on a paper where we presented a possible solution to routing and I’ll talk about that paper in detail and then what we may be doing initially as a stopgap before we do some more advanced stuff.
Lightning from 10K feet - Bidirectional channels
So Lightning from 10,000 feet - bidirectional channels. The regular payment channel, you have a 2-of-2 multisig escrow where Alice and Bob get together. They put 1 Bitcoin each or maybe Bob has 1, Alice has 0 and they put that into escrow itself. In order to do this protocol safely because we have nested transactions with the funding transaction and the commitment transaction, we require a malleability fix and the one we currently use within our software is SegWit or Segregated Witness. If you don’t have this malleability fix you can run into some odd areas where maybe Bob malleates the transaction and Alice’s funds are stuck and then Bob does some random thing and basically says “if you want your money, you give me a little bit more” and with a malleability fix we can avoid that. Another thing is with CLTV and CSV. CLTV meaning an absolute timeout and CSV a relative timelock itself. Using these two together we can basically have channels with unbounded lifetime meaning they never need to close. So we open a channel for a year or two in the best case if we really want to. Otherwise maybe we have a CLTV timeout which means we have a channel open for one or two days and then we need to close it out and that’s the extent of the channel. Using this new design we can have channels open indefinitely.
Balance updates themselves are basically a two-party iterated protocol where we try to do all the updates in an atomic manner for the most part where I push Alice money and she accepts this new state and then revokes the old state. It is important Alice revokes the prior state because otherwise Alice can go back to the state where she has the most money. In order to do that revocation we basically rip up all the prior states such that if Alice tries to broadcast this prior state then I can punish her essentially. The way I punish Alice is I use the Bitcoin blockchain itself as an invisible trusted third party. You can say TTPs are bad, we want to avoid that in Bitcoin. You can view the Bitcoin blockchain as an invisible trusted third party meaning that it only steps in when it is needed. If Alice tries to cheat me and broadcast State 1 and we’re at State 2 I go to the Bitcoin blockchain, this invisible third party, and it acts as the judge. With this judge, the judge basically enforces fairness for us and then the judge adds a clause within this fairness protocol which allows either side some sort of period where they can attest to the current balance. So if I’m Bob and I want to be sneaky and we’re at State 5 and I try to broadcast State 2, I can’t immediately sweep all the money. I may have to wait some sort of delay period, maybe it is a day or two, maybe it is a week. This delay period allows Alice to present to the judge, the blockchain itself, a better proof of the current state of the channel. Because every time we step forward I revoke a prior state, if I ever go backwards, Alice or Bob or whoever is my counterparty can then present this proof to the judge that shows I violated the contract and Alice can move forward and actually continue the balance. She gets all the money essentially. So it is kind of this scenario where both sides always move forward and if you ever try to move backwards you basically get punished and lose a lot of money.
Lightning from 10K feet - Chained commitments
So that’s how we handle the bidirectional payment channel aspect which allows both sides to send money back and forward. But one of the main things was the concept of the HTLC, the hash-time-locked contract. And what HTLCs says is I’ll pay you two Bitcoin, insert denomination, if you can open this commitment within one day. If we’re doing a single hop then Bob has the commitment, he can open the commitment which means he has the preimage to that commitment and it gets opened and then I pay out to Bob. This is basically a timelock commitment scheme. Previously the way people tried to solve this before was to assume the existence of timelock encryption where Alice has the preimage and encrypt it in such a way that Bob, if he’s grinding on all four cores for like two days, can actually decrypt that commitment and then get his money back. But with Bitcoin we have cool things like time based opcodes and with those opcodes we can use them directly in the system. We don’t need to rely on dubious timelock encryption that may not exist at all. So if we just do one hop that’s cool but we can do multiple hops across the network. When we do multiple hops we use these HTLCs in a chain manner where every single hop decrements the timelock. As you decrement the timelock, that gives each party a window of opportunity in which once they get the preimage they can then settle all the way backwards. So initially we clear the HTLCs meaning that everyone updates their balances, everyone has 1 Bitcoin less, all the way to Charlie and at that point Charlie settles the HTLC by revealing his preimage and that gets.. all the way back to Alice. Everyone pulls the money and everything is fine. We can do this with arbitrary path lengths essentially. Obviously if you have a longer path length maybe there are some issues with the time value money. How long do I want my money locked up? How many fees do I pay? Using this you can have end-to-end security with HTLCs meaning that Bob can’t get paid before Charlie gets paid and so on. That’s how we do these chain commitments, the network part of the lightning network. A little about commitment schemes. For lightning itself we require secure commitment schemes. A secure commitment has two traits. The first one is called hiding and the next one is called binding. What hiding means is given two commitments you can’t tell exactly what was committed to you. If you guys are familiar with randomized encryption: counter mode, AES or the IND-CPA security definition this is the exact same thing. So basically me extending the commitment towards Bob, Bob should have no idea exactly what was committed to. If he knew what was committed to he could just take that money and Alice never gets the money. The next one is binding meaning that once you commit to a certain value you shouldn’t be able to open that commitment with two distinct values. If you could do that you would have a collision on the hash function that we use. If you can collide then you can present to me multiple input values and I can pull the money twice or you can pull the money twice. Currently in the scheme we use SHA256 and you can use any other commitment scheme that satisfies these two properties of hiding and binding. If you assume a random oracle which means that a hash function gives you a perfectly random value every single time. You can use that to construct this scheme and it is perfectly hiding and computationally binding meaning that you can do collisions if you can do 2^128 work but we assume that is computationally infeasible.
Commitment Scheme Updates
So one thing I mentioned before is the revocation scheme. For every single state you move forward you need to provide me a revocation of that prior state meaning that if you ever go back to State 1 and we’re on State 2, I can take all your money. As you can see this grows linearly where if we’re doing a billion updates I have to store a billion values essentially. This can be limiting because I need a lot of storage to do this. So how do we solve this? We solve this with a compact revocation scheme. The goal here is that the sender, meaning the person who is doing the updates, should only have to store a constant amount of information. That information is the root of a special tree and the current update counter. The receiver only stores log n. If we’re doing a million updates, it only stores… super efficient. The way we do this, we use something I call authenticated PRF meaning a pseudorandom function yet when you give me all the outputs of the PRF, I can be sure that it came from the PRF that you created with a particular seed. So every single output I get, I check the proof and this is a correct output, we can keep moving forward. And when we do this… something we call ELKREM. It is basically a reverse Merkle tree. Typically with a Merkle tree, you have the leaves at the bottom and you hash all the leaves upwards and you finally have the root. I can prove to you that any of the leaves are part of this tree in log n space. But instead we go in the other direction. We start with the root. Alice has a certain secret and a hash key derivation function. In the current code, we use the HMAC key derivation function. The way you do this is, its a recursive structure. You have the root at the top and if you want to go down to the left node you hash the root and 0. If you want to go down to the right node I hash the root and 1. If we’re seeing the tree at height 2 then I hash it twice and now I give you first node. As we continue, I give you the second node, I give you the third node. You can check the proof. Now you can basically only store the third node and you can derive the first or the second based on the hash in the scheme itself. One of these are maintained by both the sender and the receiver. The sender is sending out values, the receiver gets the hash value, checks it is properly in the tree. If it can truncate its storage, it now has a parent node with some leaves. It can truncate that and we keep going. If I go back and I broadcast State 9000 you can get State 9000 basically in log n hashes. You have one of the intermediate nodes and you hash to the left and right and finally you have the proper node, you broadcast that out, you take my money and everything is fine.
Additionally, another update we’ve made is in the revocation scheme itself. Initially the revocation scheme used a chain BIP32 hierarchy. And then it went to hash based revocations (Adam Back and Rusty). But now we’re combining those two and we’re using revocation keys themselves. So revocation keys, it is kind of an optimization. We realized that in the script, this allows us to compress the scriptSig meaning what we use to redeem the script itself and also the length of the output script. We can also save an extra hash revocation. So rather than Bob presenting a secret preimage value in order to do revocation, Bob presents a key. This key is derived in a certain way such that I can give Bob a key and he can be sure that if I give him a secret value, then he can get the private key to that corresponding public key. Now we use revocation keys. When you’re looking at this if we say C is Bob’s commitment key and p is my preimage for that particular state. For Bob to create that next state, I give him this revocation key. What I do is I add his point, his public key in the channel and then I create a point with the preimage and then I add that together with Bob’s point. What Bob gets is the revocation key and he uses that and everything is fine. When I go to the next state, I give him this p value and once he has p, he can derive the private key which corresponds to the revocation key. This uses a trick in elliptic curves; the addition operation is commutative. We can see C + Gp goes down to Gc which is Bob’s private key and then because you can undistribute the math there, you get G*(c+p) and (c+p) is the r-value. When I give Bob the preimage, he takes his regular private key for his commitment key, take the preimage, add it together and then he gets the revocation key. As you can see, it simplifies down to the exact same solution and with this we get a little more compact scripts and everything is slightly more efficient. If you are familiar with pay-to-contract-hash, this is what they use where basically you can say a merchant wants to buy something, they have an initial hash. You use this to derive a payment address for the merchant. You can say in court that the merchant knew about our contract because they couldn’t have derived the private key if they didn’t have the contract itself. It is also used in vanity address generation where you can give a vanity address generator a single key and they can add points together. Point addition is much faster than elliptic curve scalar multiplication. You can also do a trick called double scalar multiplication to speed that up where you basically calculate Gc + Gp in a single step rather than doing it individually. That’s another optimization you can exploit there.
LCP (Lightning Commitment Protocol)
So the next thing we’re going over now is called the Lightning Commitment Protocol. What this is essentially is the link-layer protocol between two nodes itself. This is the protocol the nodes use to update the commitment transaction in an efficient manner. We have HTLCs flinging across, we’re sending HTLCs and they get logged in the commitment transaction and eventually Bob settles the commitment transaction. We want to make this extremely fast because if you make the link-layer protocol optimized such that you optimize throughput then the network in the whole on aggregate, if all the links are optimized, the network will be optimized as a whole in terms of the total throughput of the network. We have a few goals that we set out when we were designing this protocol. One thing is that we want to batch updates. If I have ten HTLCs that I want to send, I don’t want to wait for ten different updates. I want to say I’m putting all ten of these HTLCs onto the transaction at once, you ACK those changes and we move forward. Another thing is I want to pipeline new changes, a scenario where we have very high throughput bidirectionally. I want to be able to queue up new changes before you even get back to me because otherwise I have to stop and wait. If I’m waiting I could be forwarding HTLCs, I could be making more money, we could be helping the network better. Another thing is we want to desynchronize updates meaning that Alice and Bob being connected to each other, we don’t want only have Alice do an update, only have Bob do an update. If you imagine a network that is well connected and there’s updates going in both directions. They need to be able to create updates independently of one another. If we can add desynchronization, for the most part you can eliminate blocking. If you eliminate blocking and you’re desynchronized, you can allow batched and pipelined updates. You basically have very, very high bidirectional throughput. LCP in a nutshell is basically a thought model I’m using right now. I think of lightning as basically like shadow chains, meaning they’re kind of like blockchains where with lightning there is asymmetric state, meaning I have a commitment transaction and you have a commitment transaction. You can imagine I can add updates to the state and that’s basically like the mempool. Add this HTLC, add this one to this one and that’s in the mempool. When I want to commit, extend the chain 1 to move to a new state, I reference the mempool. Get those 5 HTLCs I added and this is the new state itself. This allows me to keep adding new changes in a desynchronized manner. We have something else called the revocation window. What the revocation window is, it is kind of like a sliding window in TCP where I can keep adding new updates until I exhaust your window. This lets me basically in a desynchronized manner, add new updates before you even reply to me. If I get to the end of the window, I wait and you ACK those changes and then I can continue forward. There’s like a timer there and maybe I can also batch these updates in one and the revocation window also acts as flow control where if I’m doing updates faster than you can keep up, I stop for a second, I wait, I let you ACK those things and we can move forward. In the current protocol, the revocation window is fixed. It is maybe like 4 or 16 but you can imagine it can do more advanced things where I’m moving faster so you can let me extend the revocation window by 2, maybe we can shrink it and so on.
Alright, so I have a quick demo, sorry a diagram to show the way the protocol works itself. So at the top of the screen we have Bob’s commitment chain where he is only at State 0. At the bottom we have Alice’s chain. In the middle we have the shared log. You can treat the log as a mempool and the log for the most part is append-only. It can be compacted once we know those entries are no longer needed. In this example, we have a revocation window of size 2 meaning Bob can create 2 new updates without Alice ACKing those updates. Initially Alice adds some HTLCs, she adds another one and another one so we now have 3 HTLCs in the log and they’re uncommitted. This means that both sides basically have this in their log but they haven’t yet updated the new state to reflect the HTLCs. What Alice does is Alice creates a new chain in the commitment, Alice creates a new update. When Alice sends over the update, she sends over a signature, she also sends over a log index which indexes into Alice’s log and says this commitment signature includes every single HTLC or every single update below this log index. Bob then constructs the proper view and a view is essentially a commitment transaction with the proper HTLCs and the balances reflected as so. So because we have add A1, A2, A0, Alice gets her balance decremented by that amount for each of the HTLCs. We have 3 pending HTLCs on Bob’s commitment transaction. At this point Bob has the new update. Because Alice signed the update to Bob, Bob can broadcast if he wants to and everything is ok. What Bob does now, he revokes the prior state. He says I like State 1, I don’t need State 0. When Bob revokes the prior state, Bob sends over the preimage for that prior state to Alice. You’ll also notice that when Alice sends over that new commitment update, Alice consumes one item from Bob’s revocation window. Revocation window decremented by one. Bob has this new state. Bob revokes the prior state. When Bob revokes the prior state, he also adds onto the end of the revocation window. So now Alice can do two more updates without Bob responding in a desynchronized manner. Now Alice has Bob’s revocation hash. Bob sends over to Alice “OK I like that update. Here’s your version” Alice revokes it and then Bob gets this new revocation hash. So what we did here, Bob basically batched 3 updates. He added 3 HTLCs at once in a single commitment transaction rather than doing 3 individual updates. With this, you can batch and get very high throughput up to a certain amount. One thing you have to be careful about in terms of batching, is you need to ensure that the transaction can actually get onto the blockchain because you can create something that has 10,000 outputs and that’s going to violate the cost or the weight limit in terms of the maximum size of a transaction. If you stay below that, you can keep batching these updates and then do it in a single one. Alice has these 3 HTLCs and Alice is going to send these HTLCs to Bob. It’s now Bob’s turn. Let’s say Bob after some period of time, gets the preimages. Once Bob gets the preimages, Bob can send all 3 HTLCs in a single transaction. Bob sends 3 Settle messages and adds them to the log and basically settles A0, A1 and A2. Bob also adds 2 HTLCs of his own going in the other direction. With a single update there is a brand new set of HTLCs in the commitment transaction…. At that point, Bob can now create a new update for Alice and Alice’s update references everything Alice has done (ie her HTLCs) and also references Bob’s HTLCs meaning Bob settles and the new HTLCs he adds himself. At this point, Alice says “I like that state update”, revokes your prior state and Bob now gets a new revocation hash from Alice. If Alice tries to broadcast A0 or A1, Bob has the preimage and Bob can punish Alice. At this point, Alice then replies back to Bob, gives him a commitment transaction which references into this shared log and then Bob is like “OK I like that too” and then that’s done. This illustrates the concept of the LCP where basically both sides are completely desynchronized. You can batch and pipeline updates within a certain revocation window and for the most part, things are non-blocking. If at some point Bob is moving too fast for Alice and Bob keeps extending new HTLCs, a new commitment update, there is a timer wait-out period…
LND (Lightning Network Daemon) Architecture
Now to lnd. So lnd is the Lightning Network Daemon. The name might change but this is what we’re calling it for now. The language is Go. If you don’t know Go, it is a systems language created by Google in 2008 or 2009 or so. Initially it wasn’t very popular but now it is getting more and more. It was created by the guys who initially worked on Plan 9 and C and they had a few principles in mind. One thing is that Go is very simple, its syntax is extremely simple, it doesn’t have a lot of line noise like Rust or C++ or anything like that. There is also a de facto formatting in the language so it is very easy to work in a large codebase because everyone’s code looks the same. The style is enforced by this particular tool. Go also has first class concurrency support. It has these lightweight threads and also primitives to do message passing between the threads and that comes in handy in several places in lnd because we want this to be as concurrent as possible. We want to optimize the daemon itself such that it can handle several HTLCs at once. It also has a pretty expansive standard library meaning that a lot of things like TLS, cryptographic operations are all in the standard library. You also get a single staticly linked binary which is very easy to cross compile between various operating systems. We use a particular Bitcoin library called btcsuite. These are the guys who made btcd, it is another alternative node implementation that I contribute to myself. I think it is a great library, it has very good documentation, very good testing. I think it is a pretty good resource for newcomers to learn Bitcoin because you can basically jump right into the code. The code is pretty well documented, pretty well commented. For the most part, they’re very helpful. You can jump on IRC and they’ll help you out. We use a series of libraries from these guys, from the GitHub repo itself. That includes handling scripts, creating transactions, validating transactions, doing things with BIP32, HD keychain, some bloom filter stuff for your SPV nodes and so on. The daemon also has a RPC server. Initially we have two versions. Only one is implemented right now but in the final version you’ll be able to pick between two RPC servers. The first one is HTTP REST. As you guys know, it is HTTP based, you do GET and POST and various other verbs and you can use that in terms of a Javascript library or via regular command line (JSON) or we can also have gRPC. gRPC is based on protobuf which is a mass serialization layer created by Google. It lets you have this declarative language where you declare in a file this is my message, these are the fields and you compile that to any language. Once you compile that to a language, you basically get free serialization. That serialization is optimized to a degree. It handles things like variable… integers and tries to keep things compact. But gRPC is an extension of that. In the same configuration file you can define a RPC service and the RPC service says “This is foo. Foo takes the message bar and then returns pass”. You compile that and that generates an entire server view and also client bindings for that server. The cool thing about gRPC is that it is based on HTTP2 which is the next generation of HTTP and adds things like binary serialization, flow control, multiplexing and so on. A cool fact is that the main co-author of HTTP2 was Mike Belshe of BitGo, fun fact for you guys. Another cool thing about gRPC is it supports bidirectional streaming RPCs meaning that we can have a stream between client and server and they can communicate bidirectionally without having to create a new TCP connection for every single request itself.
This is the architecture of the daemon. The two things in italics: the wallet controller and the chain notifier are actually interfaces. The wallet controller is basically like a bare bones interface for a basic Bitcoin wallet. This wallet doesn’t understand lightning yet but it can give us outputs and keys and do the basic things for that. Around that we have the LN wallet. The LN wallet is the version of the wallet which encapsulates the wallet controller and interface and that can drive the daemon because it knows what channels are, it knows how to do funding and so forth. Then we have the funding reservations within the wallet and what this does is it allows the wallet to handle concurrent funding reservations. You can imagine there is a race condition where this guy wants this one BTC output so that needs to be locked up for the duration of that funding. Otherwise you have a double spend if you have two funding transactions that reference the same output. The funding reservation helps to make sure that doesn’t happen. Then we have the chain notifier. With lightning it is very important, depending on your contract and on the timelocks, that you watch the blockchain. So the chain notifier abstracts that way and it is also an interface. This is responsible for things like letting me know when a new block comes in, when something has been spent, when I have four confirmations for my funding transaction. This is also abstracted away. So you can implement this with things like an API, bitcoind, btcd and so forth. And then also with the wallet controller itself we have some default implementations of the daemon which include btcwallet which is a wallet created by the same guys as btcd. We’re also working on some SPV support so you can drop that right in to any other wallet and it should work perfectly. Continuing we have the funding manager and the funding manager kind of bridges the wallet’s reservation protocol and the P2P protocol. These were designed to be relatively decoupled meaning we can take the same wallet and use it in some other application independent of what we design for the P2P protocol. We have the BoltDB. BoltDB is a pure Go embedded database. It is a pretty simple key-value store. This is what we use currently to store things like the state of the channel, what my current identification is. Things like the routing table and so on. Next we have the network router. The network router communicates directly with the server and this incorporates what it wants from the network in terms of the current graph. So the network router knows about all the channels currently open on the network, it knows about what my current neighborhood size is and that handles the layer 3 routing. Once a user wants to send a payment, it goes through the network router and then goes to the HTLC Switch which is connected to all the peers. The switch is just concerned with helping multihop forwarding. It treats all the peers and open channels as interfaces. When a payment comes in, it knows who to send it to next and forwards it on the correct interface. Finally we have the RPC Server which lets the user control and drive all of those aspects.
Demo
I guess I am going to do a demo now. So I have two VPS right now. One is in New York, the other one is in San Francisco. We have a bit of real latency so we can see some real scenarios. On the top right screen, I have a btcd node running in simnet mode. With simnet, I can create blocks instantaneously and I don’t have to wait for a block to come in every twenty minutes or so like it would be in testnet. Here’s the node right here, do a getinfo. I’m going to start the two LND nodes. Here’s the first one. Here’s the second one. So both nodes are up, they’re currently connected to btcd. Right now to control LND, we have a CLI, it is called ln-cli. It is similar to Bitcoin CLI or btcctl. This lets us do various RPC commands. We have getinfo and that’s like the Lightning ID and the identity address. The ID is just basically a SHA256 of the node’s public key. That’s what we’re currently using now to identify a node within the network. One thing I’ll do real quick is I’ll connect the nodes. Both nodes are connected now. Right now we can do listpeers and we see we have a node, a Lightning ID and there’s nothing else to report because we don’t have a channel open with it yet. We can now open a channel. Take off that block parameter, we had that on there before. Now we’re going to open up a channel with Peer 1. We have 100 million satoshis or 1 Bitcoin and we want to wait exactly for one confirmation before we consider the channel open. There we go, the channel is open now. Both sides are now currently waiting for the channel to finish. They went through an initial funding workflow where basically this node right here, the demo 1 node basically said “hey I want to open a channel with you” and then they went through the workflow. Currently in our daemon we only have single funding channels open and this is in the interests of simplicity of the implementation and the network itself. And also if you want to do a dual funding channel, we both put in 5 Bitcoin each, then you might require a little more trust. At that point you’re working with some stranger and they have your money tied up and if they go away then you can be waiting for a week or so and get inconvenienced. On the left hand screen we have some logs running in verbose mode and so you can see all the logs that are there right now. Initially you can see the funding request, you send that over, it gets a response. This basically is just given you parameters to open the channel such as the various keys we need, parameters like how long do we need to wait for the CSV delay. And then finally the originating node broadcasts the transaction. At this point, both nodes are waiting for a single confirmation. We can give them that confirmation really quick by having btcd generate a single block. The block has been generated now and both sides are ready to rock essentially. If we come over this guy, the node who was connected to, we do listpeers then we see we have a channel open with the other guy. That’s one Bitcoin and they have all the money. If we go over to this guy, we have one Bitcoin channel with the other person and the local balance is one Bitcoin so I have all the money. We see some log messages over here on the left hand side and what they’re doing here is they’re filling up that initial revocation window. Both sides basically fill this up by sending revocations. These revocations have a nil preimage, meaning they don’t actually do anything. These are just meant to populate the initial revocation window. One thing I want to show is sending some payments and some of the APIs, the RPC server and the client itself. I have this small Go program over here and what this program does is it first creates a client. Creating that client is just connecting over localhost to the daemon itself. Once we have this client, we basically have a stub of the gRPC server itself. Using this stub we can send payments around as we want to and we work with native objects in whichever language we’re working with. One other thing gRPC has is bidirectional streams and we’re going to be utilizing that here. So what the client does is it creates a stream initially and a stream basically just opens a new session between itself and the RPC server. With this stream we can then send and receive in a non-blocking manner across the stream and the server can do the same also. We’re going to show a burst of HTLCs going across or some micropayments. We want to send 2000 satoshis but we’re going to send the 2000 satoshis, one satoshi at a time meaning we’re going to complete 2000 total payments. With the loop that works here, it keeps going until all the satoshis are sent. With each send attempt, it launches a new goroutine and these are basically like lightweight threads in Go and you can launch like a thousand of them. There’s not much overhead and they’re very small stack and the runtime… that handles them rather efficiently. After each of the payments has been completed we’re going to push down on the… and then print out the number and then finally at the end I’ll be printing out the elapsed time. If we come over here to the server this is basically the way the server is set up and the path to it. So initially on the right here we have the RPC server and this is the method where its handling the sendpayment command. With that, it basically reads in from the client and once it reads in from the client it launches a new goroutine. This goroutine sends the new payment request over to the HTLC switch. And finally respond back to the client once that’s been completed. Over here on the left is the switch itself. It gets the packet and checks if it has the proper interface or not. If it has the proper interface it finally goes through and attempts to send a payment out if it has sufficient capacity. We have right here a pre-compiled binary so I can just hit Enter and the demo will run. As you see we’re done here. We’re scrolling here on the left but that’s just these log messages because they take more time to flush through the buffer but it is essentially done by this point. You see it took about 1.8 seconds, we sent 2000 individual updates and we did that in 1.8 seconds so we have about 1000 TPS. With micropayments, we can keep doing this. Note this is only on one channel and it is single direction. So you can assume if we do this bidirectionally, we can increase the throughput by twofold. Per channel and per node, this can scale horizontally. It is only dependent on latency and the hardware of the node itself. Currently within the code it is pretty much I/O bound because we’re using an inefficient manager reporting the states. That can be improved to be like an append-only log and can make things much more efficient. If we do listpeers here we see that the remote balance has 2000 and we have now 2000 less than one Bitcoin. This guy is already done over here because he has less log messages as the receiver. Finally, it extends the local chain and we see that this is the final commitment transaction here. This commitment transaction has 2000 satoshis to us. We have this delay output showing it is a Witness Script Hash and then the other side gets the rest. They can spend their money immediately because this is our version of the commitment transaction. So that’s the quick demo. At this point some remaining steps to do would be the commitment protocol, LCP. I’m going to be looking into doing some formal verification in the form of TLA+ or PlusCal which is a modelling framework created by Leslie Lamport used to check the correctness of concurrent protocols. Because we’ve just created a concurrent protocol, we’d like to have some assurance as to exactly the qualities. Make sure we have liveness meaning we don’t result in deadlock throughout and that we have safety, we always end up at the same state, and things of that nature. If we wanted to send another quick payment we could do one. Sending it 100 satoshis. And that’s finished now. So finally we’d like to close out the channel. When we close out the channel, we get the channel point and then now we’re going to close out the channel. We do closechannel, funding, txid =…. there it is. So just like that, this side, the initiator sends the closechannel request, the other guy then accepts it and broadcasts the channel. As you can see here on the left hand side, the channel has been broadcast, this is the witness spending from the multisig and we have our two keys and we also have the redeem script itself. Both sides get their money. This guy gets his 2100 satoshi and the other guys gets the remainder and we pay a small fee. Now finally, in order to close everything else out, we need to generate another block. So the block has been generated, both sides have closed the channel and then finally everything is good. So now if we go back on this guy, we do listpeers, we see we still have a peer but there aren’t any other channels remaining and both sides have now settled their balances on the blockchain. So just to recap this demo. We basically brought up two nodes, we opened a channel between them, we sent 2000 satoshi across as individual 1 satoshi payments in the micropayment scenario. It took about a second or so and we achieved around 1000 transactions per second assuming HTLCs is a transaction. There’s about 70 milliseconds of latency between them but this is not optimized at all. This is just showing a demo of what we’ve been working on so far. You guys can pull down the repo and that works now on testnet or segnet. With simnet I can control block creation myself.
Future Directions in Routing - Path Authentication
So now some things about routing. One issue we ran into with lightning is path authentication. If I’m in a network and there’s no curation, no one tells me this is the graph, any node can feed me an invalid path. You can imagine that me as a node, I get isolated and then someone feeds me this parallel network that doesn’t actually exist. I route all my payments through this and my payments become stuck and I have to wait the entire time because there’s an attacker. To prevent this, we basically authenticate all the path advertisements meaning when you tell me there’s a path between Bob and Charlie, you give me a proof of that path. The proof basically consists of two different things. The first part of the proof is a SPV proof of the funding transaction, meaning you treat me as a light client and you give me a proof showing that at some time this output was created in the blockchain, you show me sufficient work and I say “ok this channel was there at some point”. But now I want to know that you have a connection in the network between these two peers and those two peers actually know the private keys of the funding transaction which is the 2-of-2 multisig. To do this we use an aggregate signature of the four pseudonyms. If you can imagine if A_1 and A_2 are the two identities on the network, let’s say they have public keys for identities and B_1 and B_2 are the channel public keys, meaning these are the public keys within the blockchain itself, the 2-of-2 multisig. Both sides add those two points together (the public keys) and they get C_1 and C_2. Then they take C_1 and C_2 and they add that together itself. C_2 is basically like the group channel public key. In order for us to link all four identities, they generate a signature over some details of the transaction hash and that signature is a group signature using EC Schnorr. Rather than doing four individual signatures, we do one single signature that authenticates all of the parties. You could do two signatures but that would allow multiple peers on the network to attest to a single channel and that would give you a non-canonical view of the network. But we want to say every single peer is pairwise connected both in the network and within the blockchain itself. If you send this to me I’m like “ok I’ll add this because you did some work” meaning you did work to pay the transaction fees, create the channel and you also did work in order to lock up some funds in the channel itself.
Future Directions in Routing
So I collaborated with Bitfury on a paper and we called it Flare. It was an initial approach to some things we would like to see in terms of routing in the network. Flare borrows heavily from existing literature in MANETs (mobile adhoc networks) and these are mesh networks. The…. in lightning is very similar to a mesh network meaning there’s no central provider. No one… IP addresses, it should be self-configuring, nodes may come and leave at any time if they go offline. We thought we could learn a lot from the literature in routing protocols in MANETs. It is a hybrid protocol meaning it combines two phases. Typically you have proactive routing or you have reactive routing. With proactive routing you have a link state or a distance vector meaning you collect all the information proactively and then once you want to send, you have all the information. That comes at a cost, a storage cost and also a bandwidth cost basically handling all the updates from everybody else. Reactive routing basically says “I won’t keep the entire state. When I want to send a payment I may have to consult the network which adds latency into my connection establishment”. If the network knows my path, they send it back to me and then I can route and send packets around. This is reactive because at the point I want to send, I go to the network. Flare is a hybrid routing protocol which combines these two approaches. First, there is a reactive state. In the reactive state you have as a node an initial neighborhood radius and this is like five hops or four hops at max. Within this neighborhood radius, you handle all the updates. You handle all the people opening and closing channels, you handle people opening and leaving. Because this is only a subset of the entire network, you have savings in both bandwidth and storage. Rather than worrying about like 100 million peers, I only worry about maybe 100. Then we have this thing called beacons. Beacons kind of borrow from Kademlia where they add this.. distance meaning you might be a possible beacon candidate for myself if our address distance is close. That address distance is basically, I have my address A and your address B and we XOR those. If you’re a possible beacon I will add you to my table and I’ll get your other routes from you. These parameters; the neighborhood size and the number of beacons. What happens initially is you connect to the network, you have the initial neighborhood size and from there you do the beacon search. You consult all your current neighbors; who is close to me such that I get a better view of the network? Because of the way the address is generated via hash function and also the XOR, this basically allows me to get some random feeler connections onto the network. So I basically have a very well illuminated view of the neighborhood and in addition I have some feeler connections out into the network that are further away and randomly distributed. This basically resembles a fog of war where I initially have a very good view of my local but beyond that it is a little more foggy. Then we have a reactive aspect of it. Reactive comes out when I want to send a payment itself. Because this is lightning and fee updates are very fast, we could flood all those updates and all the fee schedules but that may consume a large amount of bandwidth and they may change very rapidly. So instead we know our candidate paths and we establish a HORNET onion circuit through this candidate path. Then as we’re establishing the circuit with each node within the path, we collect additional payment information and this payment information is in the form of fees. Initially I have this path discovery and then I have onion circuits on every single of the candidate routes. I pick one, maybe I pick two for redundancy; one that has the least amount of fees and I can use that route to send payments, exchange additional hashes between me and the other person. Let’s say my beacons were insufficient, meaning that with my local neighborhood and my beacons I wasn’t able to find the proper path. What I can do now is I know your address and I can use the beacons to do a DFS search using this address distance and eventually get to you. This is similar to an iterated DHT lookup rather than a recursive one and using this I’ll be able to find a path with a higher probability. There is a drawback meaning that we don’t get optimal routing distance because we’re using this probabilistic structure and we may do some unnecessary hops along the way. What this allows us to do is allow us to have a very very small amount of client state yet still be able to route with high probability.
Initially we won’t be implementing all of Flare because some of the optimizations are unnecessary in the initial stage. Maybe we have like 1000 nodes and that’s good for us. Flare is for when you have hundreds of millions and you don’t want to store all the information. If you only have 1000 nodes and every node has ten channels or so, that’s not much state. Everyone can keep that state initially. We use channel proofs… Because we have the global state, we have all the information we can achieve the optimal path length and find a node in optimal time.
I work with Lightning Labs and we’re also hiring to add to our team. We’re hiring engineers; front end engineers, systems engineers, protocol engineers and you can find our code, the daemon that I showed at lightningnetwork/lnd and some of the underlying code I mentioned which is HORNET and Sphinx to add privacy to the network at lightningonion.