Stratum v2 Bitcoin Mining Protocol
Speakers: Jan Čapek
Date: November 27, 2019
Transcript By: Stephan Livera
Media: https://stephanlivera.com/download-episode/1649/128.mp3
podcast: https://stephanlivera.com/episode/128/
Stephan Livera: Jan, welcome to the show.
Jan Čapek: Hi, thanks for having me here.
Stephan Livera: So Jan I had the pleasure of meeting you at Riga for Baltic Honeybadger and I know you’ve been doing a lot of work on Stratum, the second version of the protocol. First I wanted to get a bit of background from you, how did you get involved with all of this and working as one of the CEOs of Braiins and Slush Pool?
Jan Čapek: So I come from the engineering background, and I studied operating systems at the university and back then in about 2005 that’s where I met Pavel in my first job. We were working for a mainframe company, so it was actually not the dream job that a software engineer would want to get. Then our ways pretty much parted because I went for the embedded systems area because I was very interested in operating systems of Linux, and Pavel was more into the information systems, banking and stuff like that.
Jan Čapek: Then we got together again around 2009 at a random project where I was building infrastructure and he was designing an information system, and around 2011 we got the idea to start the Braiins company and our primary focus was developing a custom firmware for embedded systems so [inaudible 00:01:19] and ECUs for cards and things like that. So that was the primary focus, but then in 2013 we got involved with Bitcoin, actually I heard about Bitcoin around 2011 or 12 from Slush because I met Slush at some sailing trip.
Jan Čapek: He was the best friend of Pavel from childhood, but I wasn’t sure if this is the thing that I should be spending my time on, but then around 2013 I really realized that this is something that’s going to change the history of that. At that time Slush was already running the pool for two years and he was interested in moving over to a new project called Trezor, the hardware wallet, and he needed somebody responsible who would take over the pool and he got involved and scale it out because it was pretty much a small prototype running on one machine.
Jan Čapek: So that’s when Braiins company got involved and we’ve been operating the pool since 2013 up till now. Another significant milestone for us was starting the BraiinsOS, they open source initative for mining firmware because we do believe that since Bitcoin is open sourced we should have also the mining side that’s actually cycling blockchain to be open sourced as well. So this is how all these things play together.
Stephan Livera: Yeah, that’s a great explanation thank you for that. With the different components it seems that there’s this drive towards improving the decentralization and improving the control that the user has rather than trusting, let’s say, the manufacturer by running your own open source software in that case with the BraiinsOS. So I’m interested to also understand a little bit around the background of how Stratum came together. So from my reading, as I understand, there was some of the early days there was this concept of getwork and then later there was getblocktemplate, and then there was the Stratum v1 protocol.
Stephan Livera: Can you help us understand a little bit of the context around that? What are those different pieces and how do they fit together in Bitcoin’s mining history?
Jan Čapek: Okay, so originally the miners were connected directly to the Bitcoin Core. The problem was that the solo mining wouldn’t work because people needed a reward evenly spread out through time, and that’s where Slush came up with the idea of concentrating computing power, however the existing protocols to distribute mining jobs were not sufficient, they would not scale sufficiently.
Jan Čapek: So basically, if a large enough miner would connect to your pool you would have big resource problem being able to ship mining jobs to the miner. So that’s when the idea of Stratum version one, it wasn’t called version one at the time, but the idea of the original Stratum protocol came where the pool would ship only the essential parts of the mining job to the miners, and he would be also controlling the difficulty of the jobs.
Jan Čapek: So basically the miners were connecting to the pool, they were running on a long term connection on TCP/IP connection and the pool would be supplying the jobs. If for some reason the miner grows, it has more hashrate, it’s running this hashrate through that single connection, the pool would immediately see the change and the submission rate and would try to adjust the difficulty thread.
Jan Čapek: So this was possible with the Stratum protocol because it has this flow control called difficulty. So that was the major invention, and basically with this feature you could scale the amount of hashrate connected to your pool almost infinitely. At the same time this optimization or this feature works if you try to aggregate the hashrate on a single connection. The current state of mining, it looks like some farms do want to have separate connections for every single miner, but still the pool is able to control the frequency of submits through the difficulty setting.
Jan Čapek: So that was the Stratum v1. It was a major shift allowing operating the pools on large scale, and we have been using it ever since then, but it had some major flaws. One of the flaws was that protocol itself was completely insecure, so it’s pretty much text based or JSON based protocol. So any message sent through the server is clear text, plain text, anybody can read it who’s in the middle of the communication and anybody can change it.
Jan Čapek: So this is a big risk for the farms and there have been attacks through rerouting traffic through BGP, or even now today there are malicious routers in the infrastructure that actually are able to detect Stratum traffic and they do some little changes. Basically they can steal any amount of hashrate that they like, and if it’s some small number you would not even notice if you’re losing one percent of your hashrate you could say, “Oh, this is bad luck or there’s some variance in my hashrate.” So this is the first thing that we try to address with the new protocol.
Jan Čapek: Second part, we wanted to get rid of the text based protocol completely because it’s not very resource saving, it’s consuming a lot of bandwidth. So we switched the protocol completely to binary with the Stratum v2, and we’ve also looked into the construction of individual protocol messages and we tried to design them so that they’re more efficient.
Jan Čapek: So basically messages in the original protocol that were combining two things at the same time. For example, you as a miner get a notification about a new block and you’re supposed to start mining on it right away, but you can make this a little bit more efficient if the pool tells you, “Oh hey, here you have a new block template, but it’s meant for the future so you as a miner store it.” The mining firmware or proxy remembers it, and then the pool only sends a small notification saying, “Oh, there is a new block that has been found on the network, so please start mining on the block template that I sent you some time ago.” So, that’s the different efficiency improvements.
Jan Čapek: Another part of the protocol that we try to address is that we wanted to to have some controlled and well organized way to extend the protocol, and that’s we try to build the protocol around extensions. So basically any vendor, if they want to, they can design their own extension and they can pretty much run any protocol inside of it.
Jan Čapek: Third efficiency improvement I would see is something we call header-only mining, maybe this is a little bit too technical, but the current mining protocol what it does, it sends you a lot of data so that you can build your own Merkle root. Basically you have to build your own Merkle root because there’s a part in the block header that you as a miner need to adjust so that you get bigger search space.
Jan Čapek: This approach is still supported in the new protocol because it allows some advance features like proxying and switching to different pools and stuff like that, however we have introduced header only mining where the pool is able to supply a full block template where you don’t have to redo the full Merkle root over and over again. The benefit is, once you submit the result the pool does not have to do the full Merkle root computation again, and so it doesn’t have to build a Merkle root, it’s the tree of the transactions, again, which is saving the CPU time, but more importantly it’s reusing the latency.
Jan Čapek: So once you submit your result the pool is able to evaluate your result or your submission really fast, which also should end up reducing the reject rates. Reject rates, that’s another parameter that miners do care about as they don’t want to see too many shares, the results of the mining to their submitting group being rejected because of being stale or for whatever reason.
Stephan Livera: Got you, so let’s just clarify that there. So, you were mentioning… Sorry, actually do you mind if we just take a step back for one second? I just want to provide a little bit of context for the listeners around the different pieces of the software, if you will. So maybe we could just outline the difference and what are the different pieces, so you’ve got the firmware that’s on the machines, you’ve got a management system and you’ve got pool software.
Stephan Livera: So could you just outline a little bit what are those pieces at a high level, and then how does the Stratum protocol sort of work in with those?
Jan Čapek: Okay. Yeah, I will start from the pool. So pool is a piece of infrastructure and a piece of software, whose responsibility is to distribute mining jobs. It has to do it on timely manner, so basically it periodically refreshes miners with a new mining assignment, mining jobs so that the value of the block is, let’s say, maximum. So the miners are collecting as many transaction fees as possible. Also, if there’s a new block found in the network the pool has to notify the miners that they should stop mining on anything that they’ve been working on until now, and that they have to start mining on a new block template because anything that they have been working on after a new block is found is completely invalid and will be rejected by the pool.
Jan Čapek: The goal of the pool is to reduce the variants in miners rewards because we know that Bitcoins are mined in blocks, and a block is worth 12.5 Bitcoin now before the holding. You can not mine one Bitcoin, and this is the reason why the miners do connect to the pool.
Stephan Livera: Okay, so that’s the pool part and distributing work from the pool to the miner. Can you tell us a little bit about the software that the miner uses in terms of their own management software, and then the firmware that’s actually on each individual mining device?
Jan Čapek: Okay, I will try to step to the end, so let’s speak about the mining devices first. When bitcoin mining started we had a situation where you were able to mine on a basically on your laptop, the original idea was one CPU one vote, but mining changed since then and it evolved from mining on your machines and your CPUs to GPUs then the first FPGAs came. FPGAs are field-programmable gate arrays, but to translate this, if you want to design a chip or a piece of silicon. Usually describe the chip in a hardware language and then you synthesize it and you can test it on a FPGA, so it’s kind of like a flexible development platform for designing chips.
Jan Čapek: So this was another speed up in mining and then the ASICs case, when somebody had the idea, “Okay, we’re now good with the FPGA, and we can try synthesizing a real silicon.” Then we got into this performance efficiency race since then, but why I’m explaining it is that originally you were running a software called cgminer on your machine, on your laptop, on your server, whatever and all the mining logic was inside of the software.
Jan Čapek: When the GPUs came the cgminer software was still being used, but parts of the cgminer were only drivers for the GPU. So there were some parts of code uploaded to your GPU card, and the cgminer was only controlling this piece of hardware. Same thing was happening with the FPGAs, and then when the ASICs came even more logic has been moved out of the cgminer, and cgminer itself is an open sourced software using GPL licensing, but the problem was that people were not noticing that there was a shift in perception of what really open source is.
Jan Čapek: We saw that the manufacturers were producing something that was not open sourced anymore however, back to the role of this component, the whole mining stack. So the firmware itself in the mining device these days is responsible for accepting the jobs communicated through a mining protocol, and try to find the solution for such mining job. The solution is called nonce, it is not important to understand exactly, but this is a number that you can plug in to the Bitcoin block header and when you do SHA256 hashing of the Bitcoin block header you come to a result, and if this meets a target set by the network then you have a valid block.
Jan Čapek: These results are being submitted to the server. What the mining farm can also do is, that if they want to save bandwidth they can run a proxy, which is a regular software running on a server that sits their miners, their mining devices and the pool. These mining devices connect to the proxy and the proxy aggregates the result submission to the pool, and at the same time it looks, with the current protocol, it looks like one big miner. So this is for this part of the mining stack.
Stephan Livera: Yeah, and then can we talk to the different pieces within this puzzle, so you’ve got the devices, you’ve got the proxy and then from the website you’ve got hashrate consumer and job negotiator, and then they talk to Bitcoind. So can you tell us what the role is of the hashrate consumer and the job negotiator?
Jan Čapek: With the new protocol we wanted to provide a mechanism that would allow miners to choose their own work and the way we were thinking to do it, which was… The main idea was inspired by BetterHash because Matt came with the idea that it would be nice if miners would be able to select their work. The problem with BetterHash was that it was not safe for the pool, you as a pool can not simply accept random work because part of the work is also the reward for the miners. The way the current mining works, is that pool sends you a mining job that contains the miner reward that goes to the pool wallet, and the pool the decides how you participated in mining and sends, divides the reward from the block according to your participation in each mining round.
Jan Čapek: So it would not be safe for the pool to allow the miners to choose even this part of the block template, and that’s where the job negotiation protocol came in place. Where the miner, let’s speak about say mining proxy, so the device in a mining farm can negotiate a custom job with a pool, with a transaction set that miner believes is the side that he wants to mine on, and if the pool approves such a transaction set he can start mining on it. So this is the part of portfolio job negotiation protocol.
Stephan Livera: Okay, got it. So before we go further can we just take one step back and just talk about that potential censorship angle that Matt Corallo was trying to address with BetterHash. As I understand then the idea would be that if, say, a few key big mining pools were compromised, then some transactions could be censored or potentially those mining pools might be having their hashrate redirected towards to performing a 51% attack. Could you just outline what was that censorship potential attack, and then how Stratum is trying to help change that with having the miners select their own work?
Jan Čapek: Speaking to the censorship attack, the feature in the mining protocol is there as a security measure to prevent something like this from happening, however it is important to understand that even if you have this feature it’s still up to the pool if the pool wants to implement the job negotiation part. So if we have an evil pool doing censorship of transactions, even if it implements this protocol extension it can always… It’s not a, sorry it’s not an extension it is a protocol. It can always deny whatever templates you try to negotiate, so your chances are you have to go to a different pool not doing the censorship.
Jan Čapek: So it is a security measure in a way that if this feature is needed we can have pools that do support it, but at the same time if, let’s say, all the pools collude and don’t support this job negotiation protocol then miners can’t do anything about it. They would have to start their own pools, which is also an option. I would say the job negotiation feature are the benefits of this feature more on the business level where there could be cases where miners would be willing to pay premium to get their own feature, to get their own transaction set being mined, which I think is the right one.
Jan Čapek: Most of the miners are going to shoot for maximizing their mining reward, so they want to get the most expensive transactions selected by Bitcoin Core, but at the same time there could be some miners that do want to have a specific set.
Stephan Livera: Yeah okay, got you. Are there any other attacks or sort of vectors on which the current Stratum protocol is more vulnerable against?
Jan Čapek: We have covered the man in the middle attack, but I can briefly repeat this part. If the current protocol was running over a secure channel we wouldn’t have this problem, but unfortunately most cases it’s not supporting any encryption or any authentication. So what can happen with the current protocol, the only and major problem is stealing the hashrate by inserting man in the middle and alternating the submissions. That’s pretty much it I would say.
Stephan Livera: Okay, got you. So one other question I’m just curious as well is, how do mining pools know that miner isn’t faking it or bluffing in terms of how much hashpower they’re contributing to the pool?
Jan Čapek: That’s the whole point of the proof of work, so you can not fake your computing power because the mining puzzle, the proof of work, has certain difficulty. The properties of this game is that you have to use a certain amount of energy in order to find the solution because the crypto puzzle or the proof of work is pretty much showing that you have used a certain amount of energy in order to find the result. The result is a random act, it’s a random event finding a result and the probability of finding the result is always the same.
Jan Čapek: So it doesn’t matter what you did in the history or what you, whatever results you had doesn’t impact what you will find in the future. So it’s impossible for the miners to fake their computing power because the result that they’re submitting directly manifest the amount of computing power they have. Maybe an important feature to explain is that the result is always associated with the difficulty that the pool assigned to the miner as a mining task.
Jan Čapek: This difficulty directly reflects the amount of effort that the miner has used, but I can actually think of… There is an attack on the mining, which is definitely difficult to prevent. I don’t know if the current protocols are able to prevent it because it is not very simple and it would require Bitcoin to hard fork, and the attack is that miners who actually find a block. If they intentionally or unintentionally fail to submit such block, that is called a block withholding attack, then the pool has little bit of a problem because they would not notice. It just looks like the miner has serves lower hashrate, but it’s very difficult to detect it you can only detect it on big amounts of data and on the big miners.
Jan Čapek: If the attack is, let’s say, crafted in a very elaborate way where it’s through many miners then it’s a real problem, but this is not prevented by a BetterHash, this is not prevented by Stratum v1 or Stratum v2. It cannot be easily prevented unless we have a chance in Bitcoin protocol so that the miner actually is not able to evaluate if he or she found a block or not.
Stephan Livera: So-
Jan Čapek: At the time of submission.
Stephan Livera: Oh, I see. Yeah, so they would have to submit their work to the pool without knowing that they had correctly solved a block.
Jan Čapek: Yes, and there are some proposals of this, but there’s no solution. The question is if you really need a solution because Bitcoin is designed around incentives, so it is not in best intentions or there’s no good incentive for miners to be withholding blocks because they’re damaging their own rewards as well. It’s an interesting attack on a specific pools, for example PPS pools could easily bleed out because PPS pools are paying by share. So basically if you solve the puzzle for the pool on a difficulty set by the pool, which is much lower than the network difficulty you get paid for every solution, and the payment is proportional to the current difficulty of the network. It can be easily computed, and this pool pays out these rewards completely independently from how many blocks it finds.
Jan Čapek: So if they have a big miner who is doing such an attack, which could be detected if the miner’s big enough using some statistical analysis, then they would have a problem because they would bleed out. They would be paying money, but there would be no blocks found on their set or no blocks, smaller percentage of blocks because the other miners are assumed not to do the block withholding.
Stephan Livera: Right, the miner in that case is kind of cheating or scamming the pool a bit out of money, because they’re just getting paid for shares that they’re not actually providing because they’re withholding them right?
Jan Čapek: Yeah, this is a realistic attack, but we didn’t hear about too many being detected or happened. Another case of such an attack is an unintentional block withholding attack, if for example you have a bug in your firmware or if you’re rolling your nTime incorrectly, but in that case it should be detected on the pool in a way that such shares are actually invalid. So there are different cases, it doesn’t have to be always an intentional thing or, because the code path in the miner itself, let’s say if you can imagine that there’s a piece of code in your mining software that say, “Okay, if I found a block I just want to know about it so I increment this counter.” But this code path is not being executed too often right, like, how many times your miner finds a block and if the software is not properly tested, which, for example, for the cgminer there is no test suite available.
Jan Čapek: Then what if the software crashes in that exact moment where it’s trying to increment that counter of found blocks that is going through a code path that is executed very rarely, like, could be a bug that causes a crash, or some other problem where the software actually doesn’t get to submit the block because it’s not running anymore, and this can happen.
Stephan Livera: Right. Yeah, hopefully that sort of thing would be caught in a test before the software rolls out though.
Jan Čapek: That’s why we’re doing the cgminer thing called, our project is called bOSminer it’s an essential component for BraiinsOS that’s coming out hopefully for December or end of December for alpha. Which is written in a modern language and tries to address all these things and is supporting Stratum v2 from day one, so… Yeah, we do realize that this could be a potential problem as well.
Stephan Livera: Got you, and while we’re on the topic of these different pieces of software, could you just give us an overview as there are different pieces of software out and about, out there in the system, out in the wild. So as I understand from your point of view you’ve got BraiinsOS, which is the operating system and then you’ve got bOSminer, which you’ve mentioned and then you’ve got Stratum v2, but in terms of broader out there in the Bitcoin mining world. I presume there are other pieces of software as you mentioned there’s cgminer and there are other protocols that might be in play. Could you just give us an overview of the Braiins stack compared to some of those others?
Jan Čapek: Okay, when we started the BraiinsOS initiative we have looked at the current state of the mining, and like I tried to explain, the evolution of the mining hardware, of the mining devices. Today the miner is no longer just the mining software you need an operating system running on some embedded device, which is another essential component, which typically is closed source or is whatever the manufacturers publish, it’s usually like incomplete.
Jan Čapek: We started looking into this some time around 2016, 2017 and we saw, “Oh, this is not where it should be.” If you want to run a Bitcoin Core you just download the software, you compile it and you’re a part of the network. If you want to build firmware for your mining device that you bought for your money, you don’t have such great possibilities because, for example, Bitmain they publish the bmminer, which is actually a fork of cgminer. So everything it was starts from cgminer in this ecosystem.
Jan Čapek: That’s only the software and it would be very difficult and challenging to find all the bits and pieces to run the full firmware image on the mining device and the question is, what else could be broken, hidden, whatever, compromised inside of the miners if you don’t have the full control of the stack? So we try to address it with BraiinsOS where you have the full operating system, which is open source it’s based on OpenWRT a generic Linux distribution meant for routers and embedded devices.
Jan Čapek: Then we used a snapshot of bmminer for the S9’s and we start developing from there. So for the time being the current latest release, which is somewhere from June still contains the bmminer, but in parallel to this activity we also realize that we need to start writing a new software for mining called bOSminer, which would be a parallel thing to cgminer bringing basically cgminer eventually to end of life.
Jan Čapek: The reasons for doing this is, that the current cgminer code base is very cluttered and it is not a very good shape throughout the years because what manufacturers did, they usually took the code base, they forged it, usually did some breaking changes in the source code and they usually never contributed it back to the original upstream cgminer project.
Jan Čapek: Which is also a violation of GPL, which I think is kind of serious. We’re dealing with money here and people are willing to operate their mining devices with closed source firmware that doesn’t have proper audits, you don’t know what it’s doing. We had affairs called ANTBLEED where there were back doors from Bitmain allowing them to shut down the devices even though the feature was advertised as a management feature, but nobody really knew about it.
Jan Čapek: We had affairs like ASICBoost, which is another manifestation of the firmware stack being closed where the S9’s were able to do ASICBoost, but the way it was configured in the firmware for the manufacturer was that it was actually disabled and if you enabled it, it was generating incorrect results. So you could not really use that feature and that was actually preventing you from saving 30% of energy.
Jan Čapek: So these are all the reasons why we decided we want to build a full open source stack and make it a go to place for, let’s go for the community, Ideally I would like this to be go to place for the industry somewhere where the Linux kernel is. If you want to build an embedded device you go to kernel.org you download the sources, you add your drivers and hopefully contributed back because you don’t want to maintain your own fork of Linux kernel, but some manufacturers also keep their forks.
Jan Čapek: So this is where the full BraiinsOS and the bOSminer stack fits in and I would like to say that currently the BraiinsOS is still running cgminer, but this component is going to be replaced pretty soon.
Stephan Livera: Great. Sorry to go back a little bit, but you mentioned earlier rolling the nTime and in my reading I was looking up this idea of version rolling bits. Could you give us a bit of a context, what exactly is that and is that related to header only mining?
Jan Čapek: Yes, version rolling is a feature that allows you touching certain version fields of the Bitcoin block header that have been specified in BIP320 if I remember correctly.
Stephan Livera: Yeah I think so.
Jan Čapek: Yeah, and these bits can be used to extend your search base because currently the way Bitcoin block header has been designed is that it only allows you to roll a nonce field, which is four bytes, which a current miner is able to roll through in milliseconds, which is no time to even supply fresh jobs to it.
Jan Čapek: Then you also have nTime, which are supposed to touch only every second, which is not fast enough. Then you have the version field, which consists of 16 bits, but can be freely used and this extends the space to, almost 2 to the 48 if I’m correct and that 2 to the 48 doesn’t really mean anything, but maybe some people do know. Assigns are roughly 16 terahash. This space is enough to have a miner that has roughly 280 terahash per second, but is that case you would really have to supply jobs every second to the miner. Yeah, so this is it.
Stephan Livera: Got you, and so as I understand then from reading it looks like the Stratum v2 protocol has this natively, this version rolling feature whereas in the past it was, this was like an extension to the original Stratum protocol?
Jan Čapek: Yeah, this originally was an extension really meant to allow machines that were supporting ASICBoost only to operate. So v1 has something called mining configure, which is an extension that allows to negotiate protocol features and one of the features was to specify the number of bits that you can or that you are willing or need to roll as a mining device.
Stephan Livera: Got you. Okay, and also you… Sorry, we are jumping around a little bit, but there are just different areas that I wanted to jump into. So, the mining protocol, Stratum v2, defines three types of communication channels, so you’ve got standard, extended and group channels, can you help break that down for us?
Jan Čapek: Sure, standard channels are meant for the header only mining, so these are the most efficient way how to distribute work and present the least load on the server side when the pools is verifying jobs. Extended channels provide much more flexibility so that you as a mining operation can run a proxy that’s actually distributing the work in a way that you like to have it distributed. So basically the extended jobs are somewhere close to the original Stratum v1 where the miner has to roll their extraNonce to field and the extraNonce to was a field inside of the coinbase transaction, which was actually specified by the pool like how much space I give you.
Jan Čapek: Typically the pool gave you somewhere between four and eight bytes, which extends the search space for the mining job pretty much infinitely. I mean for the validity of the job, the job typically gets updated every 20 seconds, so it was more than enough to feed a big mining operation with a job and then the mining operation would be able to generate sub-mining jobs to it’s physical miners. So this is the extended job.
Jan Čapek: Then the extended jobs can be supplied to standard channels or extended channels, and this may be a little bit confusing, but to explain it. If you’re running a proxy that wants to connect through standard channels to the mining server, the mining server can only send one extended job to this proxy, which is meant for all these standard channels downstream for the miners. The concept of group channels are then in this particular use case denotes a set of channels that this extended job is meant for.
Stephan Livera: Okay, and help me understand here, is this the idea as well to help from a bandwidth saving perspective or is it a computational power saving perspective or what’s the main objective of these different communication channels?
Jan Čapek: This feature is meant for bandwidth saving and also for latency saving because it’s for… Let’s speak about a real scenario, let’s say you have a farm that has 100,000 miners and you have a proxy. It’s more efficient if the pool supplies one extended job to this proxy and, I’m speaking still of header only mining, so the server is able to prepare the merkle roots for all these 100,000 miners locally, but it doesn’t have to send all these merkle roots. So basically sending standard jobs 100,000 times for every single miner, but it only sends one single extended job and the proxy then distributes theses jobs to the miners.
Jan Čapek: So it’s saving the bandwidth and it’s also saving the latency, which could in the end effect could reduce the reject rate, because the miner would know about the job sooner.
Stephan Livera: Yes. Okay, and you mentioned the reject rate there, and as I understand that’s also related with the stale. As I understand that might be where, let’s say I’m a miner and then the mining pool that I’m contributing to they’ve already found a block, but I didn’t know that and I contributed some work after the fact and now that work is no longer useful, is that a correct… Could you help us understand that, what is that stale rate?
Jan Čapek: Yeah, so stale rate or reject rate, if you send a result to the pool or if you submit a result to the pool, the pool has to do the validation and it can reject the share for various reasons. One of the reasons is that the share is no longer valid, it is valid technically so it fulfills the difficulty that you have been assigned, but it is not valid because a new block has been found in the mean time and you’re supposed to mine on a new block.
Jan Čapek: The new protocol tries to address this reject rate issue by supplying you with a new block template as fast as passible, and actually it sends you the block template ahead. So you get a new block template with a future flag saying, “Oh, this is a block template that you’re supposed to start mining when I tell you.” Then when a new block is found the pool only notifies you that you should start using this block template.
Jan Čapek: Since the block template is a little bigger message than just this small notification about you being supposed to mine on a new block, this also reduces the chances that you will be mining on the stale job, on the old job for a longer time than needed. So it doesn’t eliminate the reject rate completely, there will always be some rejection because this is essentially a race condition in the mining, but it tries minimize it to the lowest possible value.
Stephan Livera: Got you, yeah thank you for that. Also with sort of related, empty blocks. So as I was reading it looks like Stratum v1 is slower to send a full block than an empty block, whereas Stratum v2 has been designed to make it so that there’s no extra delay to send a full block versus an empty block. So could you help us break that down, maybe we could just start with what is the problem with empty blocks to start with?
Jan Čapek: I would say empty blocks overall are not a problem, you can look at it from different perspectives. Currently when a Bitcoin network finds a new block it is not a simple task for the Bitcoin Core to generate a new template, it literally takes seconds it really takes some time. So it’s better for the miners to have something to work on than waiting and wasting the energy, and shutting down the miner is not an option because you cannot shut down 100 megabyte operation that would create glitches and degrade [inaudible 00:43:47].
Jan Čapek: It’s impossible, so the miners are sort of like race cars. They have to go full throttle all the time, that’s the most efficient approach how you have to run them. So you have to feed them with the jobs otherwise you’re wasting the energy, so with the current Stratum v1 protocol it is better for the miner to have an empty block template than not work at all.
Jan Čapek: At the same time with the empty blocks what we were thinking about with the Stratum v2, is that if we can send an empty block template in advance we may be able to send a speculative block template that contains the complement of transactions that are not in the current block template as being mined by the pool.
Jan Čapek: If the transactions are shifted in the mempool by some offset, this could work very well. So chances are that if a new block is found and the miner actually starts mining not on the empty template, but on the template that already contains some transactions that have been lower in the mining pool. If somebody else finds a block, chances are that this template that you already supplied to the miners would still be valid, and then you’re actually gaining time because you gave the miners already useful work that contains the transactions and you as a pool can start negotiating with Bitcoin Core to generate a new block template for you, which is like the current valid one.
Jan Čapek: Then you just refresh the miners with the new template, which is not a real time process anymore you don’t have that deadline where you really have to give somebody the new mining work as fast as possible, the new block template. Does it explain the question?
Stephan Livera: I think so, yeah, and maybe just a question to help improve my understanding here. So, for example, typically miners like, if you’re fee maximizing you want to take the highest fee first, but is it a possibility then that as part of this template getting sent in advance there might be some lower fee transactions that are being sent as part of that and then you might, if you’re lucky, get some lower fee ones confirmed as part of that?
Jan Čapek: That could be the case too, the politics on choosing the ultimately block templates can be different and we didn’t want to specify a policy inside of the protocol we would like to leave this up to the implementers. I could see that the pool actually send some multiple block templates to the pool not just to an empty one and not a full one, but there could be a couple more. Again, as I said, we would like to leave this policy up to the implementers.
Stephan Livera: Great, and one other one I was curious about is you’ve got here zero time back end switching. So as I understand then this means that a miner can switch, which pool more easily that they want to contribute hashpower to. Could you outline what that is and what that process is?
Jan Čapek: When… I think I need to explain what an extraNonce one is first.
Stephan Livera: Sure.
Jan Čapek: When a miner gets a job the job is unique for that specific miner, and part of the v1, part of this job was something called extraNonce one, which was a value injected into the coinbase transaction. If you wanted to supply a job from a different back end, let’s say from a different pool, let’s say you were a proxy distributing jobs. With the v1 you would have to completely restart the mining session or your miner would have to support an extension called extraNonce one subscription where the proxy or the pool was able to actually notify, “Oh hey, here is a new externals one, which is the thing identifying your mining session, And here is a new job, so please use it with this one.”
Jan Čapek: Whereas in the new protocol we already do have this feature we call an extraNonce prefix build into the protocol. So the pool or the service surveying the job, I don’t want to say pool because pool usually has no incentive to distribute jobs from other pools, but in case of switching to different algorithms for example, sorry two different coins for example you want to be able to supply jobs for Bitcoin, Bitcoin Cash, Bitcoin SV or whatever. Then you don’t want to harass the miners with say, “Oh, please can you just connect somewhere else and then there you’re going to get a new job.”
Jan Čapek: But instead of it the service distributing such jobs can just notify them, “Hey, here’s a new extraNonce prefix, which identifies your mining session and here is a new job and please use it.” That’s it, so we have built this extraNonce one subscription into the protocol.
Stephan Livera: Okay, great. Are there ant barriers that you see to miners adopting Stratum v2? Are there any obvious downsides or negatives that they might face?
Jan Čapek: I think the one barrier would be the firmware adaption. If you want to take full advantage of the protocol you would, you need a firmware that supports the protocol, so it doesn’t go to miners, but it really goes to the manufacturers. We try to address this fact by providing the BraiinsOS and the protocols that also have reference implementation in Rust language so it should be fairly easily readable, very stable and people can basically build their work on top of that.
Jan Čapek: On the pool side I see there are really good incentives for implementing the protocol because the pools would pretty much start saving bandwidth immediately, which would be manifested immediately in the quality of their service. They can support higher submission rates, so the more frequent that the miner actually submits to the pool because you were saving on the bandwidth you can technically use the data to submit more results more often.
Jan Čapek: This directly is reflected in your mining rewards because the miner from pool perspective is like a small Bitcoin network because it’s, it also has luck, it’s trying to find a share with certain difficulty and sometimes it has more luck and in some periods of time it has lower luck. The variance in the luck is related to the difficulty that the pool assigns to you, so if you have a difficulty that means that you find a block every one second, so this one second is because it follows the Poisson distribution.
Jan Čapek: I think there is a 63% chance that you find a block within this one, sorry the share within this one second, and there’s like a 95% chance that you find it in four seconds. Whereas if you are assigned a difficulty so that your submission rate is five seconds, for example, then the chance that you find the share within five seconds is still the same 63%. So 63% of the shares are found at this time and 95% of the shares would be found within 20 seconds, and when you look in the statistics on the pool side and you will see a variation in your hashrate.
Jan Čapek: Sine the pool, when it divides the rewards looks at your hashrate, if I oversimplify the rewarding scheme, at the time the block has been found by the pool then you may see a variation in your rewards. So this can be directly manifested, but another incentive that I would see for pools to implement this feature is also security part, which we didn’t cover yet at all. I was just speaking about the man in the middle, but we try to propose in a, basically an industry standard extension based on noise protocol framework, which is well proven in cryptographic framework to build handshake protocols.
Jan Čapek: It’s being used by Lightning Network and WhatsApp and many other, and it’s essentially a platform that allows you to generate a handshake protocol that you like without making mistakes in the security flaws. It’s built on modern crypto, so all the communication eventually after the handshake is done is encrypted and it’s using authenticated encryption by default. So basically any message that has been tampered can be easily detected and at the same time it’s encrypted, it’s called authenticated encryption with associated data.
Jan Čapek: Yeah, so I see… The incentives would be clear. What I find a little bit challenging is that we need to polish certain things, for example, for the security part we need to decide how we are going to standardize the way you distribute the public key of the pool, because the whole point of encryption is that it is nice, but it would be worthless if you have no way to verify that the pool that you think you’re talking to is the pool that you really wanted to connect to.
Jan Čapek: So in the web world we have https where you have some X509 certificates, and here we should come up with a little bit more loaded and more flexible scheme that would not have all these administration burdens like X509 management has. So this the only challenging part, which is I like, a pre-condition to run the noise protocol framework protocol designed.
Stephan Livera: Great, and as this Stratum v2 has had input as well from Matt Corallo, a very well known Bitcoin developer and known for his contributions on many mining related contributions as well, and also a security review by Peter Todd, have you had any initial feedback from the community or from other industry players?
Jan Čapek: We’re talking to big farms who are actually commenting the docs pretty loudly, so far we didn’t get any major comments saying, “Oh, this is a complete no go.” Or, “There’s a huge flaw in the protocol.” But we need to really polish small things, so the feedback so far that we got seems positive. The collaboration with Peter and Matt was beneficial to the whole project because, again like I said, we use BetterHash as an inspiration for the decentralization part and Matt does understand that it’s important to have an industry standard for the new mining protocol, and he’s actually one of co-authors of the new standard because we really wanted to get him on board. So all this adds credit to the whole initiative of fixing the protocol.
Stephan Livera: Great, and for any miners and mining pools who are using Stratum the original protocol, Stratum v1 we might call it, are there any complications around, let’s call it backward compatibility or is there a layer or some way that these different miners and pools can talk to each other?
Jan Čapek: The way we would like to proceed with rolling out the protocol once the standard is stable or considered final, is that pools would be pretty much immediately able to provide the v2 service because all they would do is they would just put a simple v2 to v1 translating proxy in front of the server. So they would not risking any software issues or bugs or anything, and this would be the immediate step and this would bring immediately the security part and the bandwidth saving part. It would still not be able to cover the efficiency improvement on block template distribution because the old v1 protocol doesn’t support it, and if you just use a simple proxy it cannot do more than that.
Jan Čapek: Then once this is proven to work they can start implementing it natively into their Stratum servers, and this is exactly what we started doing on Slush Pool where we are already running v2 to v1 proxy so that miners can actually test the service already. It’s on v2stratumslushpool.com and the next step would be once the standard is stable to support the protocol natively inside of Stratum service with all the benefits.
Stephan Livera: If we to just turn to just Bitcoin mining and running a pool a little bit more broadly, can you offer us some insight into what are some of the scalability challenges just running a pool, just generally?
Jan Čapek: Generally the scalability challenge is that your pool is distributed throughout the globe, but at the same time you have to collect data from the whole pool in order to do all the accounting. The challenge is that the connectivity doesn’t work out very well globally if you have operations in China and if you have operations in the US and Europe, sometimes you have outages. So you have to find new ways around it with redundant connections IPsec links that allow you routing through your own traffic and so on.
Jan Čapek: I think these are the biggest challenges even if you try to move over to cloud, we partially use cloud services as well. It’s also not 100% guaranteed that you would be 100% time available, so you need to diversify and then you have the problem because you need to connect those server clouds data centers so that they can talk to each other.
Jan Čapek: I think this is the biggest challenge, basically trying to make sure your infrastructure retains the connectivity globally.
Stephan Livera: Great. Yeah, and another comment I’ve seen is this idea that amongst the Bitcoin mining industry that there are some moves being made amongst pools to try to be more of a one stop shop for their customers. So they might try and provide hardware needs and financial services needs at the same time, rather than just being a traditional pool. Can you offer any of your thoughts on, what’s your view on that?
Jan Čapek: I think this is definitely the way the industry is moving towards and we are also trying to participate in this movement. So we’re trying to work on new services that would cover along these areas.
Stephan Livera: Great, and one other question just around, with Bitcoin mining pools and traditionally in the, coming from even a financial services world or technology world, there is this whole idea of SOC 1 and SOC 2 audits, like technology controls audits. Is that something that you think Bitcoin mining pools would do as well to provide comfort to investors or is that something that you don’t really see as necessary?
Jan Čapek: I think it is an interesting area to explore because investors who are entering the mining industry are facing a small reality check. For example if you try buying your mining hardware, how does it work these days? You just paid a lot of money in advance, is this an industry standard in other domains? I’m not sure about it, so I think those kind of audits would be something that could actually be demanded and could be a good product for some of the players in the industry, I would say.
Stephan Livera: Fantastic. Look, I think they’re most of the questions I had, but just summarizing my understanding then. So Stratum v2 as a proposal and a protocol, I think, the main benefits that I can see from our discussion is it helps decentralization of Bitcoins mining and helping stop that censorship as we mentioned. It helps stop that man in the middle attack, and there’s also a bandwidth saving from moving from the JSON format into the Stratum v2 format that’s being used. I suppose in terms of next steps going forward, what are you mainly looking for? Are you looking for review comments on the protocol document? Are you looking for contributions? What would you like to see from the listeners?
Jan Čapek: Yeah, I would like to see comments, I think the most important part is to finalize also those security protocol. So basically decide on the cipher suites, which currently I think there’s only choice of two, which the industry is going towards is the AES in GCM mode on there’s the Chacha20 Poly1305 or whatever it’s called. One is better on the server side because you have hardware acceleration, the other one is better for the ARMs so if you wanted to have the encryption also on the firmware level the second one would be better choice.
Jan Čapek: More importantly we need to really look into what would be the industry standard for distributing the public keys of the pools, like certificate and this certificate would be then used by the miners to verify that they are really talking to the pool that they think they’re talking to. This would be for the security thing.
Jan Čapek: An important part that still needs to get done is changes in Bitcoin Core, so that it’s able to supply multiple block templates in a more efficient way. We didn’t talk to the topic of template distribution protocols today, but this is what it is about. It’s essentially a protocol between the Bitcoin core and the provider of the jobs, so that the job provider’s receiving a stream of Bitcoin jobs that he can provide to the miners to negotiate.
Jan Čapek: So this still needs to be designed and implemented on the Bitcoin Core side, but the protocol itself has been outlined, so the challenges would be on the Bitcoin Core developer side. That’s about it.
Stephan Livera: Okay, great. I think that’s it for today, but Jan if you could make sure you let the listeners know where they can find you and find the protocol as well?
Jan Čapek: If you want to research the protocol just go to stratumprotocol.org if you are interested in to our open source initiative called BraiinsOS go to braiins-os.org, if you want to check out our pool just go to slushpool.com and you can find me on twitter under Jan Braiins. That’s pretty much it.
Stephan Livera: Fantastic, that awesome. It’s been very educational for me, so thank you for joining me today.
Jan Čapek: Thanks for having me in your show. Hopefully I was not being too technical, but this deserves it a little bit.