Non-fungible token (NFT) and the ERC-721 standard

NFTs (non-fungible token) are one of the latest and most exciting developments in the blockchain universe, with use cases ranging from such essential things as breeding cute virtual kitten digitally on the blockchain to digital auctions as conducted earlier this year by the renowned fine art auctioneer Sothebys’s. In this post, we will explain why an NFT is nothing but a smart contract with specific functionality and talk about the ERC-721 standard that formally defines NFTs.

Non-fungible token

In a previous post in this series, we have looked at token according to the ERC-20 standard. We have seen that in its essence, a token is implemented by a smart contract that is maintaining a mapping of accounts to balances to track ownership in a digital currency.

As for a traditional currency, documenting ownership by just keeping track of how many token you own works because such a token is completely fungible – any two are the same. If you hold a token, say the Chainlink (LINK) token, the blockchain records a balance, say 100 LINK, and if you transfer 20 LINK to another account, it does not make sense to ask which of the 100 LINK you have transferred.

This is a good approach to model a currency, but sometimes, you want to achieve something else – you want to document ownership in a uniquely identifyable asset, say a piece of art, or a property. To do this, you would assign a unique ID to each asset, and then keep track of who owns which asset by maintaining a mapping of asset IDs to owners. This is more or less what a non-fungible token does.

Correspondingly, a non-fungible token contract (NFT contract) is essentially a smart contract that maintains a data structure to document ownership in a specific item, modeled as a mapping from item IDs (the so-called token ID) to the current owner. Suppose, for instance, an artist releases a collection of digital pieces of art, numbered from 1 to 100, and sells them as NFTs. Then, the token ID would range from 1 to 100, every ID would represent the corresponding piece of art, and the mapping would document who owns which item.

Apart from the mapping itself, the contract would also have to offer methods to transfer ownership, say a method transfer that would accept a token ID and the new owner as input and would adjust the mapping to update the owner accordingly. You could also come up again with an approach to pre-approve transfers so that the new owner could actively call into the contract to claim ownership, and, in addition, you would probably add a few convenience functions, for instance to figure out the current owner for a given token ID.

The ERC-721 standard

Similar to the ERC-20 standard, the community has developed a standard – the ERC-721 standard – for smart contracts representing NFTs. However, the ERC-721 standard is considerably more complicated than the ERC-20 standard. Here is an overview of its methods and events.

That might look a bit intimidating, but do not worry – we will go through each of the building blocks step by step, and things will fall into place.

Let us start with balances. ERC-721 defines two different approaches to balances. First, there is the ownerOf method which simply returns the owner of a specific asset, identified by the token ID. Essentially, this simply queries the mapping of token IDs to current owners that the contract maintains. Here, the token ID is a 256-bit integer, i.e. a uint256 in Solidity. In addition, there is still a balanceOf method that returns the total number of NFTs owned by a specific account (this is useful in combination with the enumeration features that we will discuss below).

Next, let us start to discuss transfers. The easiest way to initiate a token transfer is to use the method transferFrom. This method allows a caller to transfer a given token, identified by the token ID, from its current holder to different address. Of course, the sender of the message needs to be authorized to perform the transfer – the current owner of a token is always authorized, but there are more options, we will get to this point below.

This is essentially the same structure and logic as the transfer method of a fungible token according to ERC20. However, there is a risk associated with using this method. Suppose that you use this method to transfer a token to a certain address, and that the receiver is a smart contract. Then the NFT will be transferred to the smart contract, and only a transaction originating from the smart contract can transfer the token to another account. If the smart contract is not prepared for this, i.e. if it does not have a method to initiate such a transfer, the NFT is forever lost (unless, maybe, the contract is an upgradeable contract) and therefore the NFT will remain assigned to the contract forever.

To at least partially mitigate this risk, the ERC-721 standard encourages contracts that are capable of managing NFTs to make this visible by implementing a marker interface. Specifically, a contract that is prepared to receive NFT should implement a method

function onERC721Received(address operator, address from, uint256 tokenId, bytes data) external returns(bytes4);

The idea behind this is similar to the receive and fallback functions in Solidity. Of course, the pure presence of this function does not say anything about its implementation, but at least it indicates that the author of the contract was aware of the possibility that the contract might receive an NFT.

In order to restrict transfers to transfers to either an EOA or a contract that implements the marker interface, an NFT contract offers the method safeTransferFrom. This method is very similar to an ordinary transfer, with two exceptions. First, it is supposed to check whether the receiving address is a smart contract (or, which is not exactly the same thing, has a non-zero code). If yes, it will try to invoke the method onERC721Received of the target contract which is supposed to return a defined sequence of four bytes (a “magic value”). If the target contract does not implement the method, or the method exists but returns a different value, then the transfer will fail.

Second, the method safeTransferFrom optionally accepts a data field that can contain an arbitrary sequence of bytes which is handed over to onERC721Received of the recipient. The target contract can then, for instance, log this data or perform some other operations like updating a balance and storing the passed data as a reference.

Let us now turn to authorizations – who is allowed to initiate a transfer? Of course, the owner of an NFT is always authorized to transfer it. In addition to this, the withdrawal pattern is supported as well, similar to the ERC20 standard. In fact, there is a method approve that the owner of an NFT can invoke to authorize someone else to transfer this token. Approvals can also be explicitly revoked, and of course approvals are reset if ownership for an NFT changes.

In addition to this explicit approval that refers to a specific token ID, it is also possible to register another address as being authorized to make any transfer on your behalf, i.e. as an operator. Once defined, an operator can transfer any token that you own and can also make approvals and therefore authorize withdrawals. This global approval method has no equivalent in the ERC20 standard, but there is an extension (EIP-777) to the standard which adds this functionality for fungible token as well.

Finally, the standard defines events that are supposed to be emitted when a transfer is made, an approval is granted or revoked or an operator is named or removed.

The enumeration extension

The ERC721 standard makes it easy to figure out the current owner of an NFT once you know the token ID – simply call ownerOf using the token ID as argument. However, the token ID can, in general, be any 256 bit number, and there is no reason to assume that this will always be a simple sequence. As a consequence, it is not obvious how to figure out which token IDs are actually in use, i.e. which token have already been minted.

To address this, the standard defines a set of optional methods that allow a user to enumerate all existing tokens. This proceeds in two steps. First, the method totalSupply is supposed to return the total number of token in existence, i.e. token that have already been minted. Then, the method tokenByIndex can be called with an index less than the total supply to get the ID of a specific token. Similar, the balanceOf method (which is mandatory) returns the number of token held by a specific owner, and tokenOfOwnerByIndex can be used to enumerate these token. Implementing these optional methods requires an additional data structure in the contract, for instance an array that contains all token IDs.

This enumeration extension is the only standardized way to get a list of existing token IDs. It forces the contract, however, to implement and maintain additional data structures and I would assume that many contract owners have chosen not to implement it (in the next post, we will look at a few real-word examples, and as a matter of fact, none of them implements this extension). Alternatively, a contract could emit a (non-standard) initial log entry upon contract creation to indicate all token IDs that are available directly after contract creation, and then a user could monitor the Transfer events which, per the specification, should be emitted if an additional token is minted.

That concludes our post for today. You might have noted that we have not yet discussed an extension that is indicated in the diagram at the top of this article – the metadata extension. This extension touches upon an interesting question – if an NFT documents ownership in (say) a digital asset, where is the actual asset stored? This question and its ramifications will be the topic of the next post in this series.

Compiling and deploying a smart contract with geth and Python

In our last post, we have been cheating a bit – I have shown you how to use the web3 Python library to access an existing smart contract, but in order to compile and deploy, we have still been relying on Brownie. Time to learn how this can be done with web3 and the Python-Solidity compiler interface as well. Today, we will also use the Go-Ethereum client for the first time. This will be a short post and the last one about development tools before we then turn our attention to token standards.

Preparations

To follow this post, there is again a couple of preparational steps. If you have read my previous posts, you might already have completed some of them, but I have decided to list them here once more, in case you are just joining us or start over with a fresh setup. First, you will have to install the web3 library (unless, of course, you have already done this before).

sudo apt-get install python3-pip python3-dev gcc
pip3 install web3

The next step is to install the Go-Ethereum (geth) client. As the client is written in Go, it comes as a single binary file, which you can simply extract from the distribution archive (which also contains the license) and copy to a location on your path. As we have already put the Brownie binary into .local/bin, I have decided to go with this as well.

cd /tmp
wget https://gethstore.blob.core.windows.net/builds/geth-linux-amd64-1.10.6-576681f2.tar.gz
gzip -d geth-linux-amd64-1.10.6-576681f2.tar.gz
tar -xvf  geth-linux-amd64-1.10.6-576681f2.tar
cp geth-linux-amd64-1.10.6-576681f2/geth ~/.local/bin/
chmod 700 ~/.local/bin/geth
export PATH=$PATH:$HOME/.local/bin

Once this has been done, it is time to start the client. We will talk more about the various options and switches in a later post, when we will actually use the client to connect to the Rinkeby testnet. For today, you can use the following command to start geth in development mode.

geth --dev --datadir=~/.ethereum --http

In this mode, geth will be listening on port 8545 of your local PC and bring up local, single-node blockchain, quite similar to Ganache. New blocks will automatically be mined as needed, regardless of the gas price of your transactions, and one account will be created which is unlocked and at the same time the beneficiary of newly mined blocks (so do not worry, you have plenty of Ether at your disposal).

Compiling the contract

Next, we need to compile the contract. Of course, this comes down to running the Solidity compiler, so we could go ahead, download the compiler and run it. To do this with Python, we could of course invoke the compiler as a subprocess and collect its output, thus effectively wrapping the compiler into a Python class. Fortunately, someone else has already done all of the hard work and created such a wrapper – the py-solc-x library (a fork of a previous library called py-solc). To install it and to instruct it to download a specific version of the compiler, run the following commands (this will install the compiler in ~/.solcx)

pip3 install py-solc-x
python3 -m solcx.install v0.8.6
~/.solcx/solc-v0.8.6 --version

If the last command spits out the correct version, the binary is installed and we are ready to use it. Let us try this out. Of course, we need a contract – we will use the Counter contract from the previous posts again. So go ahead, grab a copy of my repository and bring up an interactive Python session.

git clone https://github.com/christianb93/nft-bootcamp
cd nft-bootcamp
iypthon3

How do we actually use solcx? The wrapper offers a few functions to invoke the Solidity compiler. We will use the so-called JSON input-output interface. With this approach, we need to feed a JSON structure into the compiler, which contains information like the code we want to compile and the output we want the compiler to produce, and the compiler will spit out a similar structure containing the results. The solcx package offers a function compile_standard which wraps this interface. So we need to prepare the input (consult the Solidity documentation to better understand what the individual fields mean), call the wrapper and collect the output.

import solcx
source = "contracts/Counter.sol"
file = "Counter.sol"
spec = {
        "language": "Solidity",
        "sources": {
            file: {
                "urls": [
                    source
                ]
            }
        },
        "settings": {
            "optimizer": {
               "enabled": True
            },
            "outputSelection": {
                "*": {
                    "*": [
                        "metadata", "evm.bytecode", "abi"
                    ]
                }
            }
        }
    };
out = solcx.compile_standard(spec, allow_paths=".");

The output is actually a rather complex data structures. It is a dictionary that contains the contracts created as result of the compilation as well as a reference to the source code. The contracts are again structured by source file and contract name. For each contract, we have the ABI, a structure called evm that contains the bytecode as well as the corresponding opcodes, and some metadata like the details of the used compiler version. Let us grab the ABI and the bytecode that we will need.

abi = out['contracts']['Counter.sol']['Counter']['abi']
bytecode = out['contracts']['Counter.sol']['Counter']['evm']['bytecode']['object']

Deploying the contract

Let us now deploy the contract. First, we will have to import web3 and establish a connection to our geth instance. We have done this before for Ganache, but there is a subtlety explained here – the PoA implementation that geth uses has extended the length of the extra data field of a block. Fortunately, web3 ships with a middleware that we can use to perform a mapping between this block layout and the standard.

import web3
w3 = web3.Web3(web3.HTTPProvider("http://127.0.0.1:8545"))
from web3.middleware import geth_poa_middleware
w3.middleware_onion.inject(geth_poa_middleware, layer=0)

Once the middleware is installed, we first get an account that we will use – this is the first and only account managed by geth in our setup, and is the coinbase account with plenty of Ether in it. Now, we want to create a transaction that deploys the smart contract. Theoretically, we know how to do this. We need a transaction that has the bytecode as data and the zero address as to address. We could probably prepare this manually, but things are a bit more tricky if the contract has a constructor which takes arguments (we will need this later when implementing our NFT). Instead of going through the process of encoding the arguments manually, there is a trick – we first build a local copy of the contract which is not yet deployed (and therefore has no address so that calls to it will fail – try it) then call its constructor() method to obtain a ContractConstructor (this is were the arguments would go) and then invoke its method buildTransaction to get a transaction that we can use. We can then send this transaction (if, as in our case, the account we want to use is managed by the node) or sign and send it as demonstrated in the last post.

me = w3.eth.get_accounts()[0];
temp = w3.eth.contract(bytecode=bytecode, abi=abi)
txn = temp.constructor().buildTransaction({"from": me}); 
txn_hash = w3.eth.send_transaction(txn)
txn_receipt = w3.eth.wait_for_transaction_receipt(txn_hash)
address = txn_receipt['contractAddress']

Now we can interact with our contract. As the temp contract is of course not the deployed contract, we first need to get a reference to the actual contract as demonstrated in the previous post – which we can do, as we have the ABI and the address in our hands – and can then invoke its methods as usual. Here is an example.

counter = w3.eth.contract(address=address, abi=abi)
counter.functions.read().call()
txn_hash = counter.functions.increment().transact({"from": me});
w3.eth.wait_for_transaction_receipt(txn_hash)
counter.functions.read().call()

This completes our post for today. Looking back at what we have achieved in the last few posts, we are now proud owner of an entire arsenal of tools and methods to compile and deploy smart contracts and to interact with them. Time to turn our attention away from the simple counter that we used so far to demonstrate this and to more complex contracts. With the next post, we will actually get into one the most exciting use cases of smart contracts – token. Hope to see you soon.

The Ethereum ecosystem

So far we have mainly discussed what a node needs to do – processing and validating blocks and transactions and maintaining state. This by itself is not particularly useful – we need components to access nodes, manage accounts and balances and to develop, test and deploy smart contracts. There is an abundance of different tools and solutions around the Ethereum blockchain that do exactly this, and we will spend some time exploring this ecosystem today.

Clients

From a technical point of view, the Ethereum network is a peer-to-peer network, formed by individual nodes each of which runs a client software. This software will typically

  • discover peers, i.e. other nodes in the network
  • notify peers of new blocks that appear in the network, typically because a miner has generated them
  • maintain the state of the blockchain
  • process blocks and make updates to the state

In addition, clients will most likely offer an API which other applications can use to get information on the blockchain, to query balances, inspect blocks, send transactions and so forth. There are a couple of clients that do all this, here is a short list which is probably far from being complete.

  • The Go-Ethereum (geth) client is the client maintained by the Ethereum foundation, and, as the name suggests, written in Go. This is the client that we will use for the later posts in this series, when we discuss how to start and run a client in a private development network or one of the official test networks. According to the Etherscan node tracker, more than 60% of the nodes in the Ethereum production network are running geth.
  • Ganache is a client mostly intended for development and testing purposes. It is part of the Truffle framework for Ethereum development and written in JavaScript / Node.js. Ganache is also the development server running as part of the Brownie framework that we will discuss and use in a future post
  • OpenEthereum, formerly known as Parity, is an Ethereum client written in Rust. It was started by Parity Technologies and later transitioned to the newly founded OpenEthereum project. It is the second-most popular client in the production network, accounting for roughly every fourth node
  • Hyperledger Besu is an Ethereum client written in Java, which is developed by the Hyperledger project hosted by the Linux foundation and mainly targeting permissioned corporate networks

dApps and gateways

To build something useful, we need to allow applications to interact with the blockchain. As nodes offer an API, we can do this – our application can try to identify a node, connect to this node and use the API to talk to the blockchain.

In practice, this poses two problems. First, to connect to a node, you need to know its URL. Thus your application would somehow need access to a list of available nodes. If, however, you do not control the nodes, this list will be volatile, and it can be a considerable effort to keep this list up to date. If you do not want to run your own node, you can use one of the organisations that offer nodes as a service. I think of these services as gateways into the Ethereum network – they basically provide access to a node that your application can use and make sure that the node is stable and up and running. Examples for node providers are

  • Infura which we will use in a later post in this series to access the test network
  • Alchemy is another provider that also offers a free tier for developers
  • Cloudflare also offers an Ethereum gateway

Be careful, not all providers offers all API requests, if in doubt consult the documentation on the respective web page.

The second problem that we have is to actually prepare and submit the API requests. Of course this can be done with custom built code that directly assembles the JSON RPC requests, but over time, libraries that encapsulate access to the API have been developed. One very popular libraries is web3.js which is a JavaScript library to access an Ethereum node. In addition to methods that correspond to methods of the JSON RPC API, it also contains a few helper functions for account and wallet management or the handling of smart contracts. We will use web3.js for our sample frontend in a later post in this series.

Being a huge fan of Python, I was of course looking for a Python library doing approximately the same thing, and in fact there is one, which, not really surprisingly, is called web3.py. We will use this as well throughout this series to demonstrate how we can interact with a smart contract from within a Python application or even deploy a smart contract.

We are now ready to discuss a vision called Web3 by its proponents (thus explaining the name of the libraries). The idea of Web3 is that instead of having centralized applications, running on servers controlled by single organisations, applications are fully distributed. Application logic is residing mostly in browsers (i.e. JavaScript applications) and smart contracts, and the frontend applications interact with the blockchain and smart contracts via nodes. In this model, code is executed either in your frontend or within the EVM, i.e. fully decentrally, and data is stored mainly in the blockchain (and in file storage services that we will discuss below). Therefore these applications are called dApps (decentralized applications). I will not discuss the pros and cons of this architecture, but simply wanted to make sure that you have heard the term. What we will develop throughout this series is, in fact, a dApp, consisting of a smart contract and a React frontend running in your browser.

Development environments

The diagram above displays a more or less complete runtime architecture. To develop frontends and smart contracts efficiently, however, you will typically want to leverage additional tools that allow you to write your contracts in a high-level language, compile them to bytecode, quickly spin up a test environment which a defined state and run tests of your smart contracts and your frontend. Over time, a large variety of development environments and frameworks have appeared to support this.

First, you will of course need to decide for a high-level language for your smart contracts. Essentially, there are two choices. The most common one is Solidity, which is a mixture of languages like Java, JavaScript or C++ with some object-oriented features like overloading, inheritance and interfaces. We will use Solidity throughout this series. A second choice is Vyper which is a bit more “Pythonic” and replaces the older Serpent, but even though I am a huge fan of Python, I have not really looked at it yet. Finally, there is LLL which uses a Lisp-like syntax and was one of the first languages for smart contract development, but does not seem to be used very often in practice any more.

Once you have decided which language and compiler to use, you will have to pick the actual development environment. Especially for first steps, the Remix IDE is an excellent choice. Remix is an full-fledged development environment running entirely in your browser. With Remix, you can edit source code, compile, deploy into a private development blockchain running in a JavaScript sandbox in your browser (or to a network of your choice) and test and debug your contracts. We will use Remix for our first smart contract to get things going quickly.

If you prefer to work locally and feel at home in the JavaScript ecosystem, you might want to take a look at the Truffle suite. Truffle allows you to compile and deploy your smart contracts using JS as a a scripting language or using a JS based interactive console. Truffle spins up its own integrated local development node and allows you to easily run unit tests. As I prefer Python over JavaScript for this sort of tasks, I will instead use Brownie which (apparently being heavily inspired by Truffle) offers a similar functionality for Python. Using Brownie, you can compile your contracts, start a console which will also bring up a local Ganache node automatically (or connect to an existing node), deploy your contracts and interact with them. Brownie can also be imported and used as a library which allows you to script things, and integrates nicely with pytest, so that you can use pytest and Brownie to easily write unit tests for your smart contracts and even create coverage reports.

Storage, data and analytics

Let us now look at a few services that are available in the Ethereum universe that more advanced applications might need. The first type of service that you will sooner or later need is a storage service. Storing data in the blockchain is expensive, and most data related to the blockchain, like a piece of digital art minted as a token, is not actually stored in the blockchain itself. Instead, the blockchain holds a reference to the image (and maybe a cryptographic hash) and the image itself is stored somewhere else.

Of course you could use any type of storage solution, maybe even your favorite cloud service provider. This, however, would mean to again rely on a centralized infrastructure. If you do not want this, you will have to go for a decentralized file storage platform. A popular choice is IPFS, which is a peer-to-peer network technology for storing files. If you store a file on IPFS, it is essentially cut into chunks, and each peer will store one or more of these chunks, a bit similar to BitTorrent. In addition, the location of the file in IPFS is referenced by the hash of the files content (the so-called CID). This implies that when the content changes, the hash changes, so that once stored, the file is effectively immutable, which is one of the reasons why this file storage solution is often used as part of blockchain-based applications. Note, however, that a node is free to drop your file at any time, so you might want to find a node which is ready to pin your content. Providers like filecoin.io offer similar decentralized file storage solutions.

In addition to storing data, the Ethereum network produces a huge amount of data every second – blocks, transaction, state updates, log and so forth. Several services exist that allow you to easily scan and explore that data. One of the best known services of this type is Etherscan. Using Etherscan, you can retrieve and analyze individual blocks and transactions, list all transactions related to a specific account, track token and retrieve various types of statistics. Etherscan also offers an API and other development services, like a registry for smart contracts. OpenSea, which is a market place for NFTs, also offers services to discover and display token. Here is how the NFT that we will actually build and deploy in the course of this series looks like in OpenSea.

Wallets

When you want to interact with the Ethereum blockchain, you will need an account and some piece of software that assembles and submits transactions. Wallets combine these two functionalities. They manage your accounts, allow you to inquire your balance and to submit transactions and can often also manage token.

Some wallets run in your browser, others are mobile apps or desktop applications. A very popular wallet that is available as a browser plugin is MetaMask. Be aware, however, that if you use MetaMask, your private keys will be stored in the browser storage and are therefore theoretically accessible by malicious JavaScript code. In addition, if you reset your profile, the data is lost, so be careful when using such a solution, think about security and have a solid backup strategy. MyEtherWallet is another popular choice.

Frameworks like web3.js also have some limited wallet functionality built into them, and the React frontend that we will build does actually use this library to offer some rudimentary functionalities that a wallet typically has.

This completes our short journey through the Ethereum ecosystem. Of course, this list is far from complete and constantly evolving, we have, for instance not talked about oracles like ChainLink, framework providers like OpenZeppelin or market places. The tools and services that we have covered are, however, sufficient to get us started, and in the next post, we will actually get our hands dirty for the first time and build our first smart contract in Solidity, switching from the slightly more theoretical discussion so far to a more hands-on style. See you!

Smart contracts and the Ethereum virtual machine

In my last post, I have taken you through the foundations of the Ethereum blockchain, namely state, transactions and blocks and how they are related. We have seen that nodes update their local state by executing transactions contained in blocks. Today, we will take a closer look at how the transactions are actually executed and discuss the killer feature of the Ethereum blockchain – the Ethereum virtual machine and smart contracts.

Transaction execution

The Ethereum blockchain can be described as a state machine. At a given point in time, a node holds a certain state. Then a block is processed, i.e. inserted into the copy of the blockchain that the node holds, and the transactions contained in the block (and actually, as we will see, the block itself) make changes to the state. After the block has been processed, the node holds a new state. Formally (and this is what the yellow paper does) the new state is therefore a function of the previous state and the block. As all nodes agree on the protocol how the state is updated, and all nodes agree on the blocks and transactions, all nodes will eventually hold the same state.

To understand the changes to the state that are made during the processing of a block, there are two sources that we can consult. First, there is the yellow paper itself, which is a bit hard to read as one needs to get used to the notation, but is the authoritative source which serves as the official specification. Second, we can look at the code of known implementations, like that of the geth client. It is actually quite instructive to compare the yellow paper and the structure of the code and compare them, so let us do this.

We start our journey with the code. One of the central objects in the code base is the BlockChain class, defined in core/blockchain.go. This class represents the copy of the blockchain held by the node, along with the up-to-date state. It has a method InsertChain that first validates block headers and then uses a StateProcessor to perform the updates to the state caused by a block or collection of blocks. Its Process method is called with a block and a current state, and applies the changes represented by the transactions in this block to the state.

Time to switch to the yellow paper for a moment. In section two, the paper defines three state transition methods.

  • First, there is a state transition method denoted by a capital pi, that accepts a state and a block and returns the resulting new state – this state transition function could be called the state transition function on the block level
  • Then, it does the same on the transaction level – this state transition function, denoted by a capital upsilon, therefore accepts a state and a transaction and returns the updated state
  • Finally, it defines a finalization function denoted by a capital Omega and declares that the block level state transition function is obtained by first applying all state transitions and then applying the finalization function to the resulting state

This gives us a first idea how the state update works, and is actually reflected nicely in the structure of the source code.

  • Given a new block, loop through all transactions in the block
  • For each transaction, invoke the function applyTransaction to process this transaction – so this function corresponds to the state transition function on block level
  • Once this has been done, finalize the block, corresponding to the block finalization function, this involves mainly the calculation and transfer of the block reward
  • all this happens in the method Process which thus corresponds to the block level transition function

Looking at this function, there is an interesting point – right at the start of the function, a modification to the state is hardcoded if the block being processed is equal to the block number of the famous DAO fork. So if you ever wanted to know how this fork really works, here is the answer – all clients that accepted the fork simply agree on a modification of the state transition function which transfers all funds in the DAO contract to a new contract from which the legitimate owner could withdraw it. In other words, the fork has modified the state transition function (this is sometimes called an irregular state change).

Let us now dive a little deeper into what processing a transaction actually means. Again, we can look at the source code (with the main part of the processing being done here) or at the yellow paper, specifically at sections six to nine. To simplify things a bit, let us assume for the time being that our transaction is an ordinary transaction, i.e. that we simply transfer Ether to some other account and no smart contract is involved (i.e. that the recipient is an EOA). In this case, the processing is actually not overly complicated.

First, a few checks are executed so see whether the transaction is valid – this includes checking that the nonce of the transaction is equal to the current nonce of the sender (before applying the state changes). As part of these checks, the upfront cost is deducted from the balance of the sender, which is defined as the gas limit in the transaction times the gas price (if the gas limit is not exhausted, the remaining gas will be refunded at the end of the transaction). Then the nonce of the sender is incremented by one, and the actual transaction processing starts here by invoking the Ethereum virtual machine. We will discuss what it does in the next section, but for the moment, be assured that if the recipient account contains no code, it simply transfers the Ether represented by the value field of the transaction from the sender to the recipient.

Once this is done, the remaining gas is refunded, and the gas used is credited to the miner (more precisely the beneficiary or coinbase of the block being processed). With this, the processing is complete and the next transaction starts.

The Ethereum virtual machine

We have now achieved a good overview of what happens during transaction processing, but we have glossed over an important point – the invocation of the Ethereum virtual machine (EVM) which happens here. So what is the EVM?

Technically, the EVM is a virtual machine, very similar to the Java virtual machine (JVM) that knows a certain set of instructions and a certain state – a stack, a memory and a program counter. When this machine executes a series of instructions known as bytecode, every instruction manipulates the state of the machine. The set of instructions (opcodes) is rich enough to make the EVM Turing complete and described in appendix H of the yellow paper. Just to give you an idea, here are some examples for the available instructions.

  • arithmetic instructions like ADD, SUB or MULT
  • comparisons and boolean logic (AND, OR, LT, GT, ..)
  • operations to access the execution context, like the balance of an account (BALANCE), the sender of the transaction (ORIGIN), or the GASPRICE
  • Flow control operation, like CALL (to call another smart contract) or REVERT (abort execution of a contract)
  • operations to access the storage of the account under which the contract is executing, like SSTORE and SLOAD

Each of these operations has an associated gas value that is consumed when the EVM executes this operation.

Now, whenever a transaction is processed, the node will check whether the recipient address has a non-zero code. If no, the processing is as described above. If yes, the address which is the target of the transaction is interpreted as a smart contract, and its associated code will be interpreted as bytecode and will be executed by the EVM (note that this happens after the contracts balance has been increased by the value of the transaction, so you can already spend this amount in the contract). Thus a smart contract is stored in the blockchain (as part of the state of its address) and its execution is triggered by making a transaction to the contract address, which might or might not involve a transfer or Ether.

During this execution, the code can make changes to the state, as it can write to the storage, and it can use the CALL operation to initiate a message call to another account which works very similar to a transaction and also has a parameter value to transfer Ether. All these changes will be incorporated into the final state, and thus the EVM execution becomes part of the state transition function.

Physically, the EVM is running on each node, but as the definition of the EVM is part of the consensus mechanism of the blockchain and the execution is fully deterministic, all nodes will arrive at the same updated state. We can therefore think of the EVM as a distributed virtual machine running synchronously on each node of the blockchain and updated “the” shared state of the blockchain.

As the EVM is Turing complete, this is a very powerful mechanism. Theoretically, a smart contract can perform any operation on the state that you can imagine. Of course, complex smart contracts consume a lot of gas and therefore their execution drives up transaction fees. This is actually a built-in security mechanism to avoid that someone deploys a smart contract that never completes (for instance by implementing an infinite loop) and blocks all nodes. In fact, as every instruction consumes gas, the gas limit is reached at some point (or the balance of the senders account is exceeded), and the execution stops as the transaction is running out of gas. Even worse (for the attacker), the consumed gas is lost, so there is actually a strong incentive to keep the complexity of smart contracts low.

We now understand what a smart contract really is – it is a sequence of instructions (bytecode) stored in an account and executed by each node whenever a transaction is directed to this account. As a smart contract can again invoke other smart contracts, you should think of the transaction execution as a chain – there is an initial transaction, which is always coming from an EOA and signed using a private key, this transaction can invoke a smart contract which in turn can invoke another smart contract and so forth. The terminology is a bit confusing at this point because different sources use slightly different definitions of what a transaction and a message call is, but I tend to think of each step in the chain as a message call and a transaction as the first step, which is distinguished from the other steps by being created and signed by an EOA, i.e. typically a human or a programm outside of the chain operated by a human.

Let us summarize what we have learned today.

  • When a node processes a block, it updates the copy of the blockchain state that it holds. The consensus mechanism makes sure that the updates done by all nodes agree
  • As part of these updates, the node processes all transactions included in the block
  • When a transaction has a non-zero value, this includes the transfer of the corresponding amount of Ether from the sender to the recipient
  • If the recipient account of a transaction has a non-zero code, i.e. is a smart contract, then the EVM built into the node will interpret this smart contract as bytecode that will be executed
  • All changes to the state made by the contract are incorporated into the state update and therefore become part of “the” global blockchain state
  • Thus we can think of a smart contract as a sequence of instructions that are stored in the blockchain and executed by the blockchain network when being triggered by a transaction

How are smart contracts created? Technically, everybody can assemble a smart contract and deploy it into the blockchain. To do this, you will have to submit a transaction containing your code (or, more precisely, code that when being executed returns your code) which has the zero address as recipient address. If during transaction execution, the node hits upon such a transaction, it will determine a contract address – this is the address to which we have to send a transaction to trigger the smart contract – from the address of the sender and the nonce of the sender, and store the contract in the code field of the state of this address.

Now to develop a smart contract, you will typically not write bytecode yourself, similar to JVM bytecode that is typically created by a Java compiler. Several high-level programming languages have been proposed over time to ease the creation of smart contracts, the most popular one being Solidity. In addition, a huge number of development tools, frameworks and platforms have been created that make the creation of a smart contract very easy. In the next post, we will go through some of these tools which will put us in a position to eventually write, compile, deploy and run our first smart contract.

Basics of the Ethereum blockchain

Today we will take a closer look at the Ethereum blockchain and discuss its most important structures, namely transactions, blocks and state. I assume that you are familiar with the basics of the blockchain technology, if not, I suggest that you read a few of my earlier posts on blocks, transactions and mining. This will be a long post, we will go a bit deeper than the usual introductions that you might have seen. As a consequence, this post is a bit more theoretical, we will get to more practical exercises soon once we have mastered the basics.

A short history of the Ethereum blockchain

As of today, the real identity of Satoshi Nakamoto, the author of the bitcoin white paper, is unknown, even though different people have claimed to be Satoshi over time. The origin of the Ethereum blockchain is far less mysterious. In fact, the Ethereum white paper that defines the basic structures and ideas of the Ethereum blockchain, was published in 2013 by Vitalik Buterin. Subsequently, its formal specification, known as yellow paper and working out the ideas presented in the white paper, was developed in 2014 (the initial commit on GitHub by Gavin Wood is from April 2014). The Ethereum foundation was established in the same year, and in 2015, the Ethereum network was launched with the creation of the first block, known as the genesis block.

Since then, Ethereum has been under constant development. Changes to the protocol are controlled by a formal process, based on EIPs (Ethereum improvement proposals). Several clients have been developed over time, like Geth in Go, OpenEthereum in Rust or the Hyperledger Besu client in Java.

At the time of writing, Ethereum is transitioning from a Proof-of-work consensus mechanism to a Proof-of-Stake (PoS) mechanism as part of the next major version of the protocol commonly referred to as Ethereum 2.0. With PoS, special nodes called validators are taking over the process of reaching consensus on the order of transactions by creating and validating new blocks. To become a validator, you have to invest a certain stake of digital currency that you lose if you misbehave. The intention of this change is to reduce the environmental footprint of the mining process, reduce transaction fees and – by supporting sharding – increase scalability. Even though the Beacon chain, which is the foundation for the new PoS approach, is already operational as of August 2021, the final transition will still take some time and is expected to happen at some point late in 2021 or early in 2022. As of today, the Rinkeby test network is already running a proof-of-authority (PoA) consensus algorithm known as clique, see EIP-255, but the final Ethereum 2.0 chain will be based on a protocol known as Casper (see for instance this paper or this paper on the Arxiv for more details on this)

Addresses and accounts

On a certain level, the Ethereum blockchain is conceptually very simple – there is, at any point in time, a state, describing among other things the balances of the participants in the network, there are transactions changing the state and there are blocks that group transactions for the purpose of achieving consensus on the order of transactions. And, of course, there are addresses and accounts that represent the participants in the network.

An account represents a (typically human) participant in the Ethereum network. Accounts are not stored in a central place, there is no such thing as “signing up” for an account. Instead, an account is simply a randomly generated public and private key pair, more precisely an ECDSA key pair with a 32 byte private key and a 64 byte public key (which, as always with ECDSA, can be derived from the private key). Everyone can create an account by simply creating such a key pair and use that account to build and submit transactions to the network. There is no central mechanism that makes sure that the same account is not used by two different actors, but given the length of the key (32 bytes, i.e. 256 bit) this is highly unlikely.

Associated with every account is the address, which is defined as the rightmost 160 bit (i.e. 20 bytes) of the hash value of the public key. Ethereum uses the Keccak hashing algorithm (which is not exactly what NIST has standardized as SHA3-256, but close to it). Again, theoretically two different public keys could produce the same address due to a collision, but in practice this possibility is mostly ignored, and the address is considered to be in a one-to-one relation to the key pair.

Actually, we have been cheating a bit at this point. The relation between accounts and addresses described above is only valid for accounts that are owned by (typically) human actors external to the blockchain. However, we have already learned that Ethereum offers the possibility to store and run smart contracts, i.e. pieces of code that execute on the blockchain. These contracts are represented by addresses as well, but there is no private key behind these addresses (and consequently, a smart contract can, by itself, not create and sign a transaction). To distinguish these two types of accounts, accounts that hold a key pair are sometimes called externally owned account (EOA), while addresses occupied by a smart contract are called contract accounts.

State

The Ethereum state is organized as key-value pairs, where the key is the address and the value is again a complex data structure which consists of the following fields.

  • First, there is the nonce. The nonce is a counter that is increased with every transaction that originates from this address (i.e. account), and we get back to its role when discussing transactions below
  • Then, there is a balance, which reflects the current amount of Ether (the native currency of the Ethereum blockchain) owned by the account
  • Next, there is a field called code, which, if the account is a contract account, holds the bytecode of the smart contract
  • Finally, there is a field called storage. This is again a set of key-value pairs that a smart contract can use as persistent storage – we will get to this when we learn more about smart contracts.

Technically, the state is not really stored in the blockchain. Instead, the blockchain contains the initial state (in block zero of the chain, i.e. the genesis block), and all transactions. As the state can only be changed as part of a transaction, this is sufficient to reconstruct the state from the blockchain data. This is in fact what a full Ethereum client (the piece of software making up a node – some people including myself find the term client for this a bit confusing, given that we will learn later that most of these clients actually act as a server) does when it is initially started – it gets all blocks from the current blockchain, replays the transactions which are part of the blocks and uses this to reconstruct the state, which is then stored in a database on disk.

Note that there are some clients, the so-called light clients, which do not actually go through this process, but only download block headers and access full clients to be able to read the state if needed. Even though the state is not stored as part of the blocks, each block contains a hash value built from the state. This (and the fact that the state is organized in a data structure called a Merkle Patricia trie) allows even a light clients to access and validate the state for a given address.

Ether and gas

Before proceeding to discuss transactions, it is helpful to understand what Ether and gas are. Ether is nothing but the native digital currency of the Ethereum blockchain, similar to what bitcoin is to the bitcoin blockchain. The smallest amount of Ether that can be transferred is one Wei, and 1018 Wei is equivalent to one Ether. The balance, for instance, that is stored as part of the state of an account, is the balance in Wei. Another unit that is frequently used is the GWei, which is 109 Wei, so that 109 GWei is again one Ether.

Ether is the currency used to make payments in the Ethereum blockchain. One of the things you pay for is the processing, i.e. validation and inclusion in a block, of transactions. Similar to the bitcoin network, miners (or, after transitioning to the proof-of-stake algorithm, validators) are rewarded in Ether. The way how the fees are calculated, however, is a bit more subtle than with the bitcoin network. The reason behind this is that as part of a transaction, a smart contract might have to be executed, and to avoid DoS attacks, we want the fees to depend on the complexity of that smart contract.

To achieve this, the Ethereum blockchain uses a measurement for the complexity of a transaction called gas. Every transaction consumes a certain amount of gas. A simple transfer, for instance, requires 21000 units of gas. When a smart contract is executed as part of a transaction, every instruction consumes a certain amount of gas as well.

To price gas and to therefore price transactions, every transaction contains a gas price. This is the amount of Ether (or Wei) that the participant posting the transaction is willing to pay per unit of gas consumed by the transaction. Miners can use the gas price to select the transactions that are most beneficial for them, so that transactions with a high gas price tend to be mined faster, while transactions with a low gas price can stay pending for a long time – potentially forever. Thus choosing a reasonable gas price is essential for the successful processing of a transaction, and clients typically use a heuristic to propose a gas price for a given transaction.

In addition to the amount of gas consumed by a transaction and the gas price, there is also a gas limit that the originator of a transaction can define. Especially when executing an unknown smart contract, the amount of gas needed can be hard to predict, and therefore the gas limit serves as a safeguard to make sure that a malicious contract cannot consume an unlimited amount of gas. If during the execution of a transaction, the gas limit is exceeded, the transaction is reverted – note, however, that the gas used up to this point is lost. Therefore choosing the gas limit is also vital to ensure proper processing of a transaction.

Let us go through an example to see how this works. Suppose that you run a comparatively complex transaction, like the deployment of a smart contract, that consumes 2,409,371 units of gas. Suppose further that the current gas price is 1 GWei. Then the amount of Ether you would have to pay for this transaction is

1 GWei = 1000000000 Wei x 2409371 = 2.409371 x 1015 Wei = 0.002409371 Ether

Assuming a price of 2.600 USD per Ether, this would cost you roughly 6.26 USD. In reality, however, the real gas price is even higher (today it was 25 GWei), so this would be more than 150 USD. This is an expensive transaction, and most transactions are cheaper. Still, transaction costs can be significant on the Ethereum mainnet (the example is actually taken from the Rinkeby test network, where of course Ether costs nothing, but just to give you an idea).

If, however, you submit the transaction with a gas limit below 2,409,371, the transaction would not complete and would be reverted, resulting in a loss of the gas consumed so far.

To make things a bit more complicated, there is actually a second gas limit – the gas limit per block. This limit is set by the miners and determines an upper limit for the amount of gas that all transactions included in a block can consume. The block gas limit is a field in the block header, and, according to the yellow paper, section 4.3.4 and section 6.2, a node will verify that for each block:

  • the block gas limit of the current block differs from the block limit of the parent block by at most roughly 0,1% (1/1024), so that miners can only change the block gas limit within this range with every new block
  • when a transaction is added to a block, the sum of the gas limit of this transaction and the gas used by the transactions already part of the block must not exceed the block gas limit

At the time of writing, the gas limit of a block on the mainnet is roughly 15 Mio. units of gas, and the utilisation is pretty efficient, meaning that most blocks seem to spend an amount of gas which is only barely below the block gas limit.

Transactions

Let us now take a closer look at an Ethereum transaction. Here is a diagram that shows the fields that make up a transaction.

Most of these fields should be clear by now. We have already discussed the gas price and the gas limit. The signature actually consists of three values, conventionally called v, r and s. Here, r and s are the components of the signature as in the usual ECDSA algorithm, and v is an additional value called the parity that can be used to unambiguously recover the public key from the signature – this explains why the public key is not part of the transaction. Note that therefore, the signature also implicitly contains the address of the sender of the transaction.

The value is the amount of Ether (in Wei) that is to be transferred from the sender to the recipient of the transaction, which can of course be zero. Finally, there is the nonce, which is a counter that needs to be incremented by one for each transaction that is generated for a given sender. This value is needed at different points during the execution of a transaction (see again the yellow paper, sections 6 and 7)

  • When a transaction is validated, the nonce of the transaction needs to be equal to the nonce stored in the state of the senders address. When a transaction is executed, the nonce in the senders state is incremented by one
  • When a smart contract is deployed as part of a transaction, the nonce determines, together with the sender address, the address of the smart contract

A miner will actually queue a transaction if it detects a gap in the nonce. Thus if you submit a transaction with nonce one and a transaction with nonce three, the miner will process transaction one, but will put the transaction with nonce three into a queue, assuming that a transaction with nonce two is still on its way through the network.

Also note that miners typically allow you to replace a transaction that is still pending by sending a new transaction with the same nonce. This is useful if your transaction is stuck because the gas price is too low – you can then send the transaction again, using a higher gas price and the same nonce, and the miner will replace the transaction in its pool of pending transactions with the new version. Of course, this is only possible as long as the transaction has not yet been included in a block. Some wallets also allow you to do the same to cancel a pending transaction – simply send a transaction with the same nonce and value zero. Note, however, that miners will only accept the replacement if the gas price of the new transaction is at most equal to that of the pending transaction, and to cancel a transaction, you will actually want to increase the gas price to make sure that the replacement is mined, not the original transaction.

Why is the nonce needed? One answer to this is that it avoids a form of malicious behavior known as replay attack. Suppose you create, sign and submit a transaction that transfers 100 ETH to Eve. The transaction is mined, included in a block and added to the chain, and Eve happily receives the 100 ETH. Now Eve is greedy – she takes the transaction from the block (which she can easily do, as the data is public) and submits exactly the same transaction once more – and again, and again – you see where this is going. The nonce avoids this – when Eve tries to submit the transaction again, the nonce of the state will already have increased as a consequence of the first transaction, and miners and validators will reject the second copy of the transaction.

We have not yet touched upon the two extra fields on the right hand side of the diagram. The first field, the data, is relevant for transactions that are ordinary transactions, i.e. transactions targeting an existing smart contract or a EOA. The field can be used to include arbitrary data in the transaction, for instance arguments for the invocation of a smart contract. The second field, called init, is only relevant if a transaction is used to deploy a smart contract. A transaction will serve as deployment when its to field is the zero address. In this case, the init field is supposed to contain byte code that will be executed, and the result of this byte code will be stored as new smart contract at the contract address determined by sender and nonce.

Blocks

After all these preparations, we are now ready to finally discuss blocks. In Ethereum, a block consists of three pieces – the block header, the list of transactions included in the block and a list of headers of other blocks, the so-called ommers (which is a gender-neutral term, sometimes these blocks are called uncle blocks). These components are displayed in the diagram below

The block header contains the following information (note that the order in the list and the diagram is not exactly the order in which the fields actually appear in the block, see the source code or the yellow paper for a full and more formal description)

  • First, there is a reference to the parent of the block, given by the hash value of the parent block. So the parent cannot be changed without breaking the chain, which is, after all, a characteristic property that you would expect from any blockchain. Formally, this is the Keccak hash of the RLP encoded parent block
  • Next, there is a hash value of the list of ommers that can be used to validate this data
  • The third field is the beneficiary, which is the address to which the mining reward for this block belongs
  • The next field is the hash value root of the Merkle-Patricia trie built from the state of all addresses. The presence of this field allows a light client to validate state information without having to download the entire chain
  • Similar to the hash value of the ommers list, the block header also contains a hash value of the tree of transactions included in this block
  • The next field is again the hash value of the root of a trie, this time the tree built from all transaction receipts. A transaction receipt is a data structure that describes the outcome of the process of validating a transaction. Similar to the state, it is not stored in the blockchain, but needs to be calculated by a client by replaying the transactions, and the presence of the root in the block header allows a client to validate that a receipt is not manipulated
  • A special part of a transaction receipt is the set of logs generated by the transaction. We will look at logs and events in a bit more detail when we talk about the Solidity programming language. To make it easier to scan the blockchain for specific log entries, the block header contains a Bloom filter of the log entries which is a special data structure that supports fast searching
  • The block header also contains the block number, i.e. the number of ancestors (the genesis block therefore has block number zero) and a creation timestamp
  • As discussed before, the block header contains the block gas limit along with the total gas used for this block, i.e. by all transactions in the block
  • A miner can use the extra data field to add at most 32 bytes of data to a block
  • Finally, there are the nonce, the mixed hash and the difficulty, which are used for the PoW algorithm (Ethereum uses an algorithm called ethash which aims at making the use of ASICS for mining more difficult by using large data structures that need to be manipulated in memory, see also appendix J of the yellow paper for a formal definition)

As you would expect, a miner that mines a new block is rewarded for this work. The reward consists of two components. First, the miner receives a base reward of currently 2 ETH for each newly mined block, regardless of the transactions contained in the block. Second, the miner receives transaction fees. The mechanism by which this happens is currently being changed by EIP-1599. Previously, a miner received the full transaction fees for all transactions in the block being mined. After implementation of the EIP, the fees will consist of a base fee that is allowed to vary slowly over time, and a priority fee that is the difference of the total transaction fees and the base fee. The miner will only receive the priority fee, and the base fee will be burned. The EIP also adds the base fee as an additional field to the block header.

Most of the above should sound familiar – but there is one detail that struck me on first reading, namely the role of the ommers. Why are they needed? The reason for including ommers is that in addition to the block reward that a miner receives for a new block, the miners (i.e. beneficiaries) of an ommer block referenced in a new block will also receive a reward, called the uncle reward.

The motivation behind this (you might want to read this paper for all the glorious details and how this impacts the rewards of miners) is as follows. Suppose you are a miner that is mining a new block, say A. At the same time, a second miner is mining another valid block, called B. Both blocks have the same parent P (the current tip of the canonical chain). Now, as the block mining rate on the Ethereum block chain is rather high, it can happen and will happen that A and B are found and distributed at roughly the same point in time. This will lead to a short fork, but after some time, the chain stabilizes and only A or B will become part of the canonical chain (the longest chain). Suppose that block B ends up being on the canonical chain – then your mining reward for block A is not part of “the” state any more, and your reward is lost.

However, suppose that you now mine a new block C, with parent B. Then, the stale block A will be an ommer of block C. If you manage to include A in the ommer list of block C, and block C makes it to the chain, you (being the beneficiary of block A) will still receive the uncle reward. The uncle reward is lower than the standard block reward, but at least the reward is not zero. In this sense, the uncle reward is a mechanism that fosters fast mining and rewards miners for producing blocks, even if this block does not make it to the canonical chain. In addition, a miner who includes ommers in a block also receives a small reward as an incentive to also include ommers created by other miners.

This closes todays post. Yes, this was a long post, but if you have followed me so far, you should have gained a solid understanding of the basic building blocks of the Ethereum block chain. Armed with this understanding, we will – in the next post – go ahead and learn more about the mysterious smart contracts that we have already touched upon several times.

The Ethereum blockchain, smart contracts and token

If you have followed my blog for some time, you might know that it started with a few posts on the bitcoin blockchain – about its foundations in elliptic curve cryptography, blocks, mining and transactions. The bitcoin blockchain has been established in 2009, and since then, a lot has happened in the blockchain world.

Maybe the most exciting new development are token – tradable coins that actually live on top of an existing blockchain. In particular non-fungible token are all the rage these days, and allow you to document ownership in a particular, uniquely identifiable asset like a piece of classical or digital art in the blockchain. Everybody who has access to a blockchain can create a token, and, according to Investopedia, more than 200.000 of these token did already exist by the end of 2019.

Technically, a token is nothing else but an application that uses persistent storage to store the information who owns which token respectively how many token. The point of a token is that this is not simply an ordinary application running in some data center, which might raise the usual concerns about whether you can trust the programmer and the operator, but is a so-called smart contract – an application whose code is stored in the blockchain and which is in a certain sense running on top of the blockchain and uses the blockchain as storage.

Thus, token ownership is stored in the blockchain, and as such, is subject to the usual guarantees in terms of integrity and durability that a blockchain has to offer. As the program code itself is also stored in the blockchain, you can also trust that it is not manipulated after initial deployment, and, as every node can run the code independently, the consensus mechanism of the blockchain also makes a manipulation during program execution at least extremely difficult.

Token are an important, but by far not the only application of smart contracts. You could, for instance, implement a smart contract that allows you to cast votes based on blockchain technology – the technology will make sure that every participant can only vote once, that votes are correctly accounted for and cannot be manipulated, and that the entire voting process is documented transparently in the blockchain. Or you can build a smart contract that acts as a deposit for collateral, where the logic implemented in the contract makes sure that the collateral is only released if a certain condition is met. There are broker applications that allow you to trade digital currency without the need for a trusted third party, fully decentralized organisations (DAO), whose members would, for instance, jointly invest into startups and vote transparently on the usage of funds, and many more applications of smart contracts. There are even games – check out Crypto Kitties, one of the first NFTs that was implemented.

Not every blockchain supports smart contracts. The bitcoin blockchain, for instance, does not (even though there is some scripting built into the validation process). The most popular (and, to my understanding, the first) blockchain that introduced smart contracts is the Ethereum blockchain, which was initially designed in 2013 and launched as a project in 2015. The Hyperledger Fabric blockchain has a similar concept (although smart contracts are technically quite different from what Ethereum does), and the same is true for Corda or EOS.

A couple of weeks back I became curious and wanted to understand how exactly a token works. I started to dive a bit into the Ethereum blockchain, smart contracts, token standards, dApps and Solidity, and, as always, decided to document my findings in a short series on this blog. If you follow along, here is what you will learn.

At the end, we will put everything together, mint our own NFT and build a frontend that will act as a wallet, allow you to list the token that you and others own and to trade token. As always, I will make the corresponding code available in my GitHub account so that you can get your hands dirty and directly jump into coding.

So let us get started and dive into the Ethereum blockchain – what it is, why it is different from the bitcoin blockchain and how it serves as basis for smart contracts and token. Watch out for my next post to appear which will talk about this.

How the number of bitcoins is limited

In some of the previous posts, we did already hit upon the file chainparams.cpp in the source code of the bitcoin reference client. It is interesting to go through this and understand the meaning of the various parameters defined there. One of them should catch your attention:

class CMainParams : public CChainParams {
public:
    CMainParams() {
        strNetworkID = "main";
        consensus.nSubsidyHalvingInterval = 210000;

What does this parameter mean? It is in fact not used awfully often, apart from some unit tests I could only locate it once, namely in validation.cpp.

CAmount GetBlockSubsidy(int nHeight, const Consensus::Params& consensusParams)
{
    int halvings = nHeight / consensusParams.nSubsidyHalvingInterval;
    // Force block reward to zero when right shift is undefined.
    if (halvings >= 64)
        return 0;

    CAmount nSubsidy = 50 * COIN;
    // Subsidy is cut in half every 210,000 blocks which will occur approximately every 4 years.
    nSubsidy >>= halvings;
    return nSubsidy;
}

The output of this function plays an important role when a new block is mined by mine.cpp – this is the amount (in Satoshis) that a miner earns in addition to the fees! Put differently, this is the amount of bitcoins that are created when a block is mined.

What this code tells us is that the amount of bitcoin added during mining starts with 50 and is divided by two every 210.000 blocks. So the amount of bitcoins mined is a given by the formula

210000 \cdot 50 + 210000 \cdot 25 + 210000 \cdot 12.5 + \dots = \sum_{n=0}^\infty  210000 \cdot 50 \cdot  \frac{1}{2}^n

The mathematicians among us will recognize this as a geometric series

210000 \cdot 50 \cdot \sum_{n=0}^\infty q^n

with q = 0.5. This series converges, and its value is

210000 \cdot 50 \cdot 2 = 21 \cdot 10^{6}

Therefore the amount of bitcoins that are created by mining – and thus the overall supply of bitcoins –  can never exceed roughly 21 million bitcoins. You might have heard that before: bitcoins are by designed with a controlled supply which is guaranteed by the continuous reduction of the subsidity as the number of blocks increases (and not because the value of a bitcoin transaction is stored in a 64 bit integer – in fact this would explain why the value of a single transaction output cannot exceed a certain value, but not why the total sum of all ever issued bitcoins is limited). This is the point in the source code that is responsible for this.

Of course I am cheating a bit – the value of a bitcoin is discrete, not a real number. Mining will stop if the value of the block subsidity falls below one Satoshi, as this is the smallest number that can be represented. Let us see when this happens. The subsidity is given by the formula

s = \frac{5 \cdot 10^{9}}{2^n}

where n is obtained by dividing the block height (i.e. length of the chain) by 210000 and converting to an integer. Solving s = 1 for n, we obtain

n = \log_2 (5 \cdot 10^{9}) \approx 32.2

Therefore the bitcoin amount created with a block will drop to zero with block

m = 210000 * 33 = 6.930.000  .

The total number of bitcoins created until then can be approximated (ignoring rounding) by the partial sum

210000 \cdot 50 \cdot \sum_{n=0}^{32} q^n = \frac{50 \cdot 10^{9}(1 - q^{33})}{1 - q}

which gives 20999999.997 bitcoins, i.e. almost exactly 21 million bitcoins as expected. We can also estimate when this will have happened. Looking at blockchain.info, we see that at the time of writing, approximately 513.000 blocks have already been mined. So we still need 6.418.000 blocks. A block is generated roughly every 10 minutes, so there are 6 additional blocks per hour and therefore 52560 blocks being added per year. Thus it will take roughly 122 years from now until all these blocks have been mined, i.e. this will happen somewhere around the year 2140. So still some time to go until then…

If you do not trust the math, you could also simulate this in a little Python program. In fact, this will give you a bit less than what we have calculated above, as the subsidity is rounded to an integer during the calculation, which our geometric series above does not properly reflect.

COIN = 10**8
nHeight = 0
btc = 0.0
while True:
    halvings = nHeight // 210000
    subsidity = (50*COIN) >> halvings
    btc += subsidity
    if subsidity < 1:
        break
    nHeight += 1

print("Total bitcoin amount: ", btc / 10**8 )

You can also easily determine how many bitcoin are available at each point in time. At 513.000 blocks, these are roughly 16.9 million BTC, which, at a price of 10.000 USD, is equivalent to a market capitalization of roughly 160 billion USD.

Mining bitcoins with Python

In this post, we will learn to build a very simple miner in Python. Of course this miner will be comparatively slow and limited and only be useful in our test network, but it will hopefully help to explain the principles behind mining.

When we want to mine a block, we first need some information on the current state of the blockchain, like the hash of the current last block, the current value of the difficulty or the coin base value, i.e. the number of BTC that we earn when mining the block. When we are done building the block, we need to find a way to submit it into the bitcoin network so that it is accepted by all nodes and permanently added to the chain.

If we were member of a mining pool, there would be a mining server that would provide us the required information. As we want to build a solo mining script, we need to communicate with bitcoin core to get that information and to submit our final block. Fortunately, the RPC interface of bitcoin core offers methods to facilitate that communication that were introduced with BIP22.

First, there is the method getblocktemplate. It will deliver all the required information that we need to build a valid block and even propose transactions that we should include in the block. These transactions will be taken from the so called mempool which is a collection of transactions that the bitcoin server knows which have not been added to a block yet (see miner.cpp/BlockAssembler::addPackageTxs in the bitcoin core source code for details on how the selection process works).

If the client is done building the block, it can submit the final block using the method submitblock. This method will run a couple of checks on the block, for instance that it can be correctly decoded, that the first transaction – and only the first – is a coinbase transaction, that it is not a duplicate and that the proof-of-work is valid. If all the checks pass, it will add the block to the local copy of the blockchain. If a check fails, it will return a corresponding error code to the caller.

With that understanding, let us now write down the processing logic for our simple miner, assuming that we only want to mine one additional block. First, we will use getblocktemplate to get the basic parameters that we need and transaction that we can include. Then we will create a new block and a new coinbase transaction. We then add the coinbase transaction followed by all the other transactions to the block.

Once we have that block, we enter a loop. Within the loop, we calculate the hash of the block and compare this against the target. If we can meet the target, we are done and submit the newly created block using submitblock. Otherwise, we increment the nonce and try again.

We have discussed most of these steps in some details in the previous posts, with one exception – coinbase transactions. A coinbase transaction is, by definition, a transaction which generates bitcoin because it has valid outputs, but does not spend any UTXOs. Technically, a coinbase transaction is a transaction which (see CTransaction::IsCoinBase())

  • has exactly one transaction input
  • the previous transaction ID in this input is zero (i.e. a hexadecimal string consisting of 32 zeros “00”)
  • the index of the previous output is -1 (encoded as  0xFFFFFFFF)

As it does not refer to any output, the signature script of a coinbase transaction will never be executed. It can therefore essentially contain an arbitrary value. The only restriction defined by the protocol is described in BIP34, which defines that the first bytes of the signature script should be a valid script that consists of the height of the new block as pushed data. The remainder of the coinbase signature script (which is limited to 100 bytes in total) can be used by the miner at will.

Many miners use this freedom to solve a problem with the length of the nonce in the block header. Here, the nonce is a 32 bit value, which implies that a miner can try 232, i.e. roughly 4 billion different combinations. Modern mining hardware based on ASICs can search that range within fractions of seconds, and the current difficulty is so high that it is rather likely that no solution can be found by just changing the nonce. So we have to change other fields in the block header to try out more hashes.

What are good candidates for this? We could of course use the block creation time, but the bitcoin server validates this field and will reject the block if it deviates significantly from the current time. Instead miners typically use the coinbase signature script as an extra nonce that they modify to increase the range of possible hashes. Therefore the fields after the height are often combinations of an extra nonce and additional data, like the name of the mining pool (increasing the extra nonce is a bit less effective than increasing the nonce in the block header, as it changes the hash of the coinbase transactions and therefore forces us to recalculate the Merkle root, therefore this is most often implemented as an outer loop, with the inner loop being the increments of the nonce in the block header).

To illustrate this, let us look at an example. The coinbase signature script of the coinbase transaction in block #400020 is:

03941a060004c75ccf5604a070c007089dcd1424000202660a636b706f6f6c102f426974667572792f4249503130302f

If we decode this, we find that the first part is in fact a valid script and corresponds to the following sequence of instructions (keep in mind that all integers are encoded as little endian within the script):

OP_PUSH 400020
OP_0
OP_PUSH 1456430279
OP_PUSH 130052256
OP_PUSH 7350439741450669469

As specified by BIP34, the first pushed data is the height of that block as expected. After the OP_0, we see another push instruction, pushing the Unix system time corresponding to Thu Feb 25 20:57:59 2016, which is the creation time of the block.

The next pushed data is a bit less obvious. After looking at the source code of the used mining software, I assume that this is the nanoseconds within the second returned by the Unix system call clock_gettime. This is then followed by an eight byte integer (7350439741450669469) which is the extra nonce.

The next part of the signature script is not actually a valid script, but a string – a newline character (0xa), followed by the string “ckpool”. This is a fixed sequence of bytes that indicates the mining software used.

Finally, there is one last push operation which pushes the string “/Bitfury/BIP100/”, which tells us that the block has been mined by the Bitfury pool and that this pool supports BIP100.

Enough theory – let us put this to work! Using the utility functions in my btc Python package, it is now not difficult to write a short program that performs the actual mining.

However, we need some preparations to set up our test environment that are related to our plan to use the getblocktemplate RPC call. This call performs a few validations that can be a bit tricky in a test environment. First, it will verify that the server is connected, i.e. we need at least one peer. So we need to start two docker container, let us call them alice and bob again, and find out the IP address of the container bob in the Docker bridget network. The three following statements should do this for you.

$ docker run --rm -d -p 18332:18332 --name="alice" alice
$ docker run --rm -d  --name="bob" bitcoin-alpine
$ docker network inspect bridge | grep  -A 4  "bob" - | grep "IPv4" -

Assuming that this gives you 172.17.0.3 (replace this with whatever the result is in your case), we can now again use the addnode RPC call to connect the two nodes.

$ bitcoin-cli --rpcuser=user --rpcpassword=password -regtest addnode "172.17.0.3" add

The next validation that the bitcoin server will perform when we ask for a block template is that the local copy of the blockchain is up to date. It does by verifying that the time stamp of the last block in the chain is less than 24 hours in the past. As it is likely that a bit more time has passed since you have created the Alice container, we therefore need to use the mining functionality built into bitcoin core to create at least one new block.

$ bitcoin-cli --rpcuser=user --rpcpassword=password -regtest generate 1

Now we are ready to run our test. The next few lines will download the code from GitHub, create one transaction that will then be included in the block. We will create this transaction using the script SendMoney.py that we have already used in an earlier post.

$ git clone https://github.com/christianb93/bitcoin.git
$ cd bitcoin
$ python SendMoney.py
$ python Miner.py

You should then see an output telling you that a block with two transactions (one coinbase transaction and the transaction that we have generated) was mined, along with the previous height of the blockchain and the new height which should be higher by one.

Let us now verify that everything works. First, let us get the hash of the current last block.

$ bitcoin-cli --rpcuser=user --rpcpassword=password -regtest getchaintips
[
  {
    "height": 109,
    "hash": "07849d5c8ddcdc609d7acc3090bc48bbe4403c36008d46b5a291185334efe1bf",
    "branchlen": 0,
    "status": "active"
  }
]

Take the value from the hash field in the output and feed it into a call to getblock:

$ bitcoin-cli --rpcuser=user --rpcpassword=password -regtest getblock "07849d5c8ddcdc609d7acc3090bc48bbe4403c36008d46b5a291185334efe1bf"
{
  "hash": "07849d5c8ddcdc609d7acc3090bc48bbe4403c36008d46b5a291185334efe1bf",
  "confirmations": 1,
  "strippedsize": 367,
  "size": 367,
  "weight": 1468,
  "height": 109,
  "version": 536870912,
  "versionHex": "20000000",
  "merkleroot": "8769987458af75adc80d6792848e5cd5cb8178a9584157bb4be79b77cda95909",
  "tx": [
    "7ba3186ca8e5aae750614a3211422423a0a217f5999d0de6dfeb8968aeb01900", 
    "1094360149626a421b4ddbc7eb58a815762700316c36407770b96ffc36a7735b"
  ],
  "time": 1522952768,
  "mediantime": 1521904060,
  "nonce": 1,
  "bits": "207fffff",
  "difficulty": 4.656542373906925e-10,
  "chainwork": "00000000000000000000000000000000000000000000000000000000000000dc",
  "previousblockhash": "2277e40bf4c0ebde3fb5f38fcbd384e39df3471ad192cc46f66ea8d8d96327e7"
}

The second entry in the list tx should now match the ID of the newly created transaction which was displayed when executing the SendMoney.py script. This proves that our new transaction has been included in the block.

Congratulations, you have just mined 50 BTC – unfortunately only in your local installation, not in the production network. Of course, real miners work differently, using mining pools to split the work between many different nodes and modern ASICS to achieve the hash rates that you need to be successful in the production network. But at least we have built a simple miner more or less from scratch, relying only on the content of this and the previous posts in this series and without using any of the many Python bitcoin libraries that are out there.

This concludes my current series on the bitcoin blockchain –  I hope you enjoyed the posts and had a bit of fun. If you want to learn more, here are a few excellent sources of information that I recommend.

  1. Of course the ultimative source of information is always the bitcoin core source code itself that we have already consulted several times
  2. The Bitcoin wiki contains many excellent pages on most of what we have discussed
  3. There is of course the original bitcoin paper which you should now be able to read and understand
  4. and of course there are tons of good books out there, I personally liked Mastering Bitcoin by A. Antonopoulos which is also available online

 

 

The difficulty in the bitcoin protocol

Mining is a challenge – a miner has to solve a mathematical puzzle to create a valid block and is rewarded with a certain bitcoin amount (12.5 at the time of writing) for investing electricity and computing power. But technology evolves, and the whole mining process would be pointless if the challenge could not be adapted dynamically to make sure that mining power and the difficulty of the challenge stay in balance. In this post, we will look in a bit more detail into this mechanism.

We have already seen that a block header contains – at character 144 – a four byte long number that is called bits in the source code and which seems to be related to the difficulty. As an example, let us take the block with the hash

0000000000000000000595a828f927ec02763d9d24220e9767fe723b4876b91e

You can get a human readable description of the block at this URL and will see the following output.

BitcoinBlockExample

This output has a field bits that has the value 391129783. If you convert this into a hexadecimal string, you obtain 0x17502ab7, using for instance Pythons hex function. Now this function will give you the big-endian representation of the integer value. To convert to little endian as it can be found in the serialized format of a bitcoin block, you have to reverse the order byte wise and get b72a5017. If you now display the raw format of the block using this link and search for that string, you will actually find it at character 144 at expected.

Now on the screenshot above, you also see the difficulty displayed, which is a floating point number and is 3511060552899.72 in this case. How is that difficulty related to the bits?

As so often, we can find the answer in the bitcoin core source code, starting at the JSON RPC interface. So let us look at the function blockheaderToJSON in rpc/blockchain.cpp that is responsible for transforming a block in the internal representation into a human readable JSON format. Here we find the line

result.push_back(Pair("difficulty", GetDifficulty(blockindex)));

So let us visit the function GetDifficulty that we find in the same file, actually just a few lines up. Here we see that the bits are actually split into two parts. The first part, let me call this the size is just the value of the top 8 bits. The second part, the value of remaining 24 bits, is called the coefficient. Then the difficulty is calculated by dividing the constant 0xFFFF by the coefficient and shifting by 29 – size bits, i.e. according to the formula

\text{difficulty} = 256^{(29 - \text{exponent})} \cdot (65535.0 / \text{coefficient})

In Python, we can therefore use the following two lines to extract the difficulty from a block header in raw format.

nBits = int.from_bytes(bytes.fromhex(raw[144:152]), "little")
difficulty = 256**(29 - (nBits >> 24))*(65535.0 / ((float)(nBits & 0xFFFFFF)))

If you try this out for the transaction above, you will get 3511060552899.7197. Up to rounding after the second digit after the decimal point, this is exactly what we also see on the screenshot of blockchain.info. So far this is quite nice…

However, we are not yet done – this is not the only way to represent the difficulty. The second quantity we need to understand is the target. Recall that the mathematical puzzle that a miner needs to solve is to produce a block with a hash that does not exceed a given number. This number is the target. The lower the target, the smaller the range of 256 bit numbers that do not exceed the target, and the more difficult to solve that puzzle. So a low target implies high difficulty and vice versa.

In fact, there are two target values. One target is part of every block and stored in the bits field – we will see in a minute how this works. The second target is a global target that all nodes in the bitcoin network share – this value is updated independently by each node, but based on the same data (the blockchain) and the same rules so that they arrive at the same conclusion. A block considered valid if solves the mathematical puzzle according to the target which is stored in itself and if, in addition, this target matches the global target.

To understand how this works and the exact relation between target and difficulty, it is again useful to take a look at the source code. The relevant function is the function CheckProofOfWork in pow.cpp which is surprisingly short.

bool CheckProofOfWork(uint256 hash, unsigned int nBits, const Consensus::Params& params)
{
    bool fNegative;
    bool fOverflow;
    arith_uint256 bnTarget;

    bnTarget.SetCompact(nBits, &fNegative, &fOverflow);

    // Check range
    if (fNegative || bnTarget == 0 || fOverflow || bnTarget > UintToArith256(params.powLimit))
        return false;

    // Check proof of work matches claimed amount
    if (UintToArith256(hash) > bnTarget)
        return false;

    return true;
}

Let us try to understand what is going on here. First, a new 256 bit integer bnTarget is instantiated and populated from the bits (that are taken from the block) using the method SetCompact. In addition to a range check, it is then checked that the hash of the block does not exceed the target – this the mathematical puzzle that we need to solve.

The second check – verifying that the target in the block matches the global target – happens in validation.cpp in the function ContextualCheckBlockHeader. At this point, the function GetNextWorkRequired required is called which determines the correct value for the global target. Then this is compared to the bits field in the block and the validation fails if they do not match. So for a block to be accepted as valid, the miner needs to:

  • Actually solve the puzzle, i.e. provide a block with a hash value equal to or smaller than the current global target
  • Include this target as bits in the generated block

This leaves a couple of questions open. First, we still have to understand how the second step – encoding the target in the bits field – works. And then we have to understand how the global target is determined.

Let us start with the first question. We have seen that the target is determined from the bits using the SetCompact method of the class uint256. This function is in fact quite similar to GetDifficulty studied before. Ignoring overflows and signs for a moment, we find that the target is determined from the bits as

\text{target} = \text{coefficient}\cdot 2^{8(\text{exponent}-3)}

If we compare this to the formula for the difficulty that we have obtained earlier, we find that there a simple relation between them:

\text{target} = \text{difficulty}^{-1}\cdot \text{std\_target}

where std_target is the target corresponding to difficulty 1.0 and given by 0xFFFF shifted by 208 bits to the left, i.e. by 0xFFFF with 52 trailing zeros. So up to a constant, target and difficulty are simply inverse to each other.

Now let us turn to the second question – how can the nodes agree on the correct global target? To see this, we need to look at the function GetNextWorkRequired. In most cases, this function will simply return the bits field of the current last block in the blockchain. However, every 2016 blocks, it will adjust the difficulty according to the following rule.

First, it will use the time stamps in the block to calculate how much time has passed since the block which is 2016 blocks back in the chain has been created – this value is called the actual time span. Then, this is compared with the target time span which is two weeks. The new target is then obtained by the formula (ignoring some range checks that are done to avoid too massive jumps)

\text{target}_{new} = \text{target}_{old} \cdot \frac{\text{actual time span}}{\text{target time span}}

Thus if the actual time span is smaller than the target time span, indicating that the mining power is too high and therefore the rate of block creation is too high, the target decreases and consequently the difficulty increases. This should make it more difficult for the miners to create new blocks and the block creation rate should go down again. Similarly, if the difficulty is too high, this mechanism will make sure that it is adjusted downwards. Thus the target rate is 2016 blocks being created in two weeks, i.e. 336 hours, which is a rate of six blocks per hours, i.e. 10 minutes for one block.

Finally, it is worth mentioning that there is lower limit for the difficulty, corresponding to a higher limit for the target. This limit is (as most of the other parameters that we have discussed) contained in chainparams.cpp and specific for the three networks main, net and regtest. For instance, for the main network, this is

consensus.powLimit = uint256S("00000000ffffffffffffffffffffffffffffffffffffffffffffffffffffffff");

whereas the same parameter for the regression testing network is

consensus.powLimit = uint256S("7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff");

This, together with the target of the genesis block (which determines the target for the first 2016 blocks) (bits = 0x207fffff ) determines the initial difficulty in a regression test network. Let us verify this (I knew I would somehow find a reason to add a notebook to this post…)

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

We are now well equipped to move towards our final project in this series – building a Python based miner. The last thing that we need to understand is how the miner operates and communicates with the bitcoin server, and this will be the topic of the next post.

Merkle trees

Suppose you are given a long file and wanted to use hashing to verify the integrity of the file. How would you do this in the most efficient way?

Of course, the first thing that might come to your mind is to simply take a hash of the entire file. This is very easy, but has one major disadvantage – if the file is damaged, there is no way to tell which part of the file has been damaged. If you use such a mechanism for instance to verify the integrity of a large download, you will have to go back and download the entire file again, which is very time consuming.

A smarter approach is to split the file into data blocks. You then calculate a hash for each data block to obtain a hash list. Finally, you could concatenate all the hashes into one large string and take the hash of that string. If a recipient of the file has access to only this top level hash, she could already verify the integrity of the file by repeating this procedure and comparing the top level hash with the hash created by the sender.

If that check fails, there is no need to re-transmit the entire file. Instead, the recipient could obtain a copy of the hash list itself and compare the hashes block by block until the faulty block is detected. This block can then be exchanged and the procedure can be repeated to verify that the replacement is valid. The bittorrent protocol uses this mechanism to detect inconsistencies in transferred files.

With a simple hash list like this, you would have to search the entire hash list in order to determine the faulty piece, requiring a time linear in the number of blocks. Searching can be improved greatly by organizing the hashes not into a list, but into a (binary) tree – and this data structure is called a hash tree or Merkle tree.

To illustrate the principle, suppose you wanted to build a hash tree for a file consisting of eight blocks, labeled 1 – 8.

HashTree

You would then proceed as follows, as indicated in the image above. First, you compute the hash value  for all blocks and assign them as values to the eight leaves of the tree. Then, you concatenate the values of leaves 1 and 2 and take the hash value of the result. This gives you the value for the parent node – called a in the picture above – of 1 and 2. You repeat the same procedure for the leaves 3 and 4, this gives you the value for the node b. Then you concatenate the values of a and b and take the hash value of the result, this gives you the value of the node c. Repeat this procedure for the right part of the tree and assign the result to node d. Finally, concatenate the values of node c and d and take the hash – this will be the value of the root node called the Merkle root. Thus, in a binary Merkle tree, the value of each node is defined to be the hash of the concatenation of the values of its two child nodes.

How is a Merkle tree realised in the bitcoin reference implementation? The currently relevant code is in consensus/merkle.cpp. This implementation is already optimized, the original implementation used to be in primitives/block.cpp and is still available on GitHub.

As the comments in the source code explain, at any level of the tree, the last node is added at the end again if needed to make sure that the number of nodes is even at each level – the algorithm that I have illustrated above only works if the number of leaves is a power of two. The actual implementation first builds a list of the leaves. It then goes through the list and for any two subsequent elements, it adds an element at the end which is the hash of the concatenation of these two elements. The last element is then the Merkle root that we are looking for.

Let us now see how a straightforward implementation looks like in Python.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
view raw MerkleRoot.ipynb hosted with ❤ by GitHub

Most of the code should be self explaining, but a few remarks are in order. First, the hash function that we use could of course by any hash function, but, to be consistent with what bitcoin is doing, we use a double SHA256 hash. Then, byte ordering is again to be observed. The bitcoin reference client uses the method GetHash() on the involved transactions to populate the leaves, and we know that this is providing the transaction ID in little endian byte order. The entire calculation then proceeds in little endian byte order. The JSON output, however, display the transaction IDs in big endian order and the same holds for the Merkle root. Thus we need to reverse the byte order before we build the tree and, when comparing our result with the Merkle root which is part of the block, reverse the byte order again.

The actual calculation of the Merkle root uses a very straightforward approach – instead of maintaining one long list as the original bitcoin reference representation did, we represent the nodes at each level of the tree by a separate list and use recursion. This is probably a bit less efficient, but maybe easier to understand.

In order to test our implementation, we finally use the Python requests library to get a block in JSON format from blockchain.info using their REST API. We then build the leaves of the tree by walking through the list of transaction IDs in the block, build the Merkle root and compare the result to the value which is part of the JSON block structure. Nice to see that they match!

What happens if a client wants to verify that a given transaction is really included in a block without downloading the full block? In this case, it can ask the bitcoin server (either via the RPC call gettxoutproof or as part of the GETDATA network message) to provide a Merkle block, which is a serialized instance of CMerkleBlock. Essentially, a Merkle block consists of a block header plus a branch of the Merkle tree that is required to verify that a specific transaction is part of the block. This eliminates the need to transfer all transactions (which can be several thousand) while still allowing the client to check that a transaction is really included in a block.

This completes our short discussion of Merkle trees. We are now able to calculate the Merkle root for inclusion in a block header given the list of transactions that a block contains. To be able to mine a block, there is one more thing that we need to understand and that we will cover in the next post in this series – mining difficulty and the proof-of-work target.