Why I’m becoming a white hat hacker

Fri 06 Jan 2023

In 2016 I excitedly joined a small Bitcoin exchange to flex my nascent cryptocurrency knowledge and hopefully learn some more. One of the other employees, let’s call him Harry, was really keen to tell me more about Ethereum, and this magical new “smart contract” called The DAO.

Even then, I had enough background in Computer Science that when he dropped the term Turing Complete I gave a moment’s pause.

I remember thinking that sounded pretty dangerous. A little known fact, even among cryptocurrency enthusiasts, is that every Bitcoin transaction actually contains a small program written in a language called Bitcoin Script. Its use cases are limited to verifying signatures (including transactions with signatures from multiple parties) and time locking funds. Bitcoin Script is hobbled in a very important way. In fact, it was designed that way on purpose.

That sounds pretty counter-intuitive, but there’s a good reason for it. Bitcoin Script is a non-turing-complete language. There are advantages that come with this. Programs written in such a language are way easier to prove correct, since by definition they can’t compute all computable things. Since, in some sense, proving things correct involves making some mathematical argument about a colossal number of mathematical entities, the fact that the language can’t express as much has the benefit of making things way easier to prove. In this sense a non-turing-complete language’s limited expressiveness is its strength.

Ethereum was explicitly designed around a turing-complete smart contract language. Simple, non-expressive languages are great for correctness, but you can’t build a thriving ecosystem of interesting financial instruments, games, governance structures (and so on) without a fully expressive programming language. Ethereum was meant to push the boundaries of what a cryptocurrency could be. It delivered on this promise almost immediately. Whereas Bitcoin was really limited to only multi-sig transactions and timelocks with Bitcoin Script, Ethereum bloomed with smart contracts defining new tokens (which developed their own independent value) and decentralised organisation contracts that provided a means by which projects could be proposed, voted on, and funded.

These advantages came with an obvious – and weighty! – downside, one that I could only see that early on because of my luck in attending a university that taught formal verification and an unorthodox career choice to work on formally verifying an operating system micro-kernel. Although I knew next to nothing about Solidity, my background told me that I should be worried. Solidity would allow any user, of any skill level, to write programs that might conceivably end up containing very large sums of money inside them. And yet, it was Ethereum’s stated goal that the flow of these funds would not be directed by the whims of human beings, only code.

Because of this, it’s of paramount importance that smart contracts be correct. Given the sheer amount of wealth, even at 2016 exchange rates, sloshing around in the Ethereum blockchain it was conceivable that some of these smart contracts might end up with hundreds of thousands or even millions of dollars of value inside them and yet be vulnerable to very simple logic bugs.

What I didn’t expect was the rapidity with which this happened. When Harry told me that The DAO had over $60M of ETH in it, I didn’t want to believe it. I tried to tell Harry that I thought there was a good chance it could get hacked but, at the time, I don’t think he’d even heard the term “turing complete” and telling him “that sounds dangerous” was much like telling a toddler going through their “I love trucks” phase that they can also maim and kill.

Two weeks later the DAO hack occurred and sent the entire Ethereum community into chaos which led to serious debates about whether “code is law” could be taken seriously, and eventually resulted in a hard fork that split Ethereum forever in twain: into Ethereum which has remained a vibrant scene of experimentation and technical progress and Ethereum Classic, a place for die-hard “code is law” types.

I walked into work the day after the DAO Hack happened feeling like Nostradamus. The other employees seemed to think that I was in possession of some kind of secret knowledge but the story was far more prosaic than that. I was simply familiar with the fact that programming languages with great expressive power are also far more vulnerable to error. Paraphrasing Stan Lee you could say with great expressive power comes great vulnerability.

The DAO hack kindled a burning question within me. Could formal methods help with the process of writing correct software? My gut told me and still tells me yes. But I didn’t (and still don’t) know for sure. I set out to answer this question in late 2016. Unfortunately, I got side-tracked for five years working in the “blockchains for traditional finance” space for a start-up. Notionally, I was hired to work on formal verification of financial software but, like so many start-ups, the organisation did not have the discipline to use my skills in the role they had advertised. Much more pressing business opportunities had sprung up and I was reassigned to work on them. To my regret, during this period of my life I barely kept abreast of what was going on in Ethereum.

It was only in March 2022 of this year that I was finally able to return to the question of whether formal methods could be used to make smart contracts less hackable. A lot had changed in Ethereum during my hiatus, but what had not changed (and in fact had only got worse) was the number and severity of the hacks on Ethereum’s and other EVM-based blockchains. To be fair, this was partially due to the maturity of the ecosystem. Decentralised Finance (DeFi) projects had blossomed during my absence and were providing real utility to large numbers of users. Put simply, the attack surface was larger.

However, the vulnerabilities which led to some of the hacks were still of a very trivial nature. The logic errors leading to loss of funds were not particularly subtle and revealed fuzzy thinking and a lack of rigour. On the positive side, the auditing and bug bounty ecosystem had matured a lot. Both activities had prevented multi-million dollar hacks. In the case of bounty hunting the dollar amounts were even quantifiable since executable proofs of concept could demonstrate just how much money had been at risk.

The use of formal methods is still in its infancy not only in the cryptocurrency space but also in the wider field of software engineering in general. One of the most common arguments against the use of formal methods is that is that it costs too much, both in time and money. To formally verify even a small codebase takes a significant amount of skill and a lot of time. These two factors translate into significant monetary costs. Yet recent history shows us that multi-million dollar hacks happen all the time. I’m subscribed to a newsletter that send me links to all the hacks that happened that week. Not a single week has gone by without at least a million dollars lost in one DeFi smart contract or another.

Given that the downsides of logic errors in smart contracts are so huge, the cost-benefit analysis seems to clearly come out in favour of using formal methods. The caveat is that this is only true if they are proven to be effective. Although I’m loath to admit it, I’m not sure that their effectiveness has been conclusively demonstrated. Yet I strongly believe that it is worth trying since I don’t believe that auditing and bug bounties alone are adequate to keep hacks to a tolerable level. Unless we can drastically reduce the number and size of hacks we may ultimately see the death of DeFi experiment, which would be a huge shame given its advantages over traditional finance.

It’s now time to reveal my goals and overall strategy. Over the years, I’ve come to highly value the concept of tacit knowledge. Tacit knowledge is simply that kind of knowledge that is hard to codify/express and difficult to pass onto others via written/verbal communication. Many physical skills have a high degree of tacit knowledge. You might pick up a tip or two from reading a book about how to ski, but you’ll only really master the skill through vast amounts of practice.

If I’m to make any progress in preventing hacks then I need to know, tacitly, how hacking is done. I need to get inside the mind of the hacker by becoming one myself. Only then can I meaningfully evaluate whether formal methods are a useful tool in fending off attacks. The fact that auditing/bounty hunting can be lucrative is part of the strategy. After (one too) many disappointing work experiences I’d like to see what can be accomplished by self-funding this research and working with the community in a more collaborative manner. I (somewhat jokingly) think of it as the kind of self-directed and interference-free research that was done by 19th century landed gentry.

So far my journey has gone better than I could have hoped. By far the place I learned the most was Code 4rena. It’s a highly competitive environment and I only managed to earn a modest $25K there over 5 months. But I don’t think I could have learned at a faster rate anywhere else. C4 trained me up to be “match fit” for auditing. I credit it with giving me the skills to land a $150K bounty in September 2022.

Once I have sufficiently mastered the art of finding bugs I will then flip into the role of preventing them via the use of formal methods. I don’t expect to start this journey until at least 2024 as there is just so much to learn. The only way to truly prove the benefits of formal methods will be to put some skin in the game, develop a DeFi project with their aid and then to expose that project to the combined hacking power of the black/white hat hacking community at large.

Until then I’m going to thoroughly enjoy learning the art of security vulnerability research.