Understanding bitcoin blocks

In order to analyse a block, obviously the first thing that we need is – a block. Fortunately, this is not difficult. We can either use the block explorer hosted by blockchain.info to get the second block in the blockchain in raw or JSON or use curl to retrieve the block from the command line, say in raw format.

$ curl https://blockchain.info/de/block/000000006a625f06636b8bb6ac7b960a8d03705d1ace08b1a19da3fdcc99ddbd?format=hex


For this post, I have chosen a block from the very early days of bitcoin, as this block contains only one transaction and is therefore rather small – current blocks typically contain more than 2.000 transactions and are too long to be displayed easily. This block is actually the third block on the blockchain and has height two. The height is the position of a block on the blockchain (ignoring forks). The block with height 0 is the very first block and actually hardcoded into the bitcoin software, this block is called the genesis block. The first blocks have all been mined (according to their time stamp) in the early morning hours of January 9th 2009.

If you want to follow this post step by step, I recommend to download and run the Jupyter notebook containing the respective code.

In order to understand a bitcoin block, let us again look at the source code of the reference implementation. If you have followed me through the dissection of a bitcoin transaction, you will not have any difficulties to understand the code in primitives/block.h and find that a block is made up of a block header (an instance of CBlockHeader) and the block data which is an instance of the derived class CBlock and contains, in addition to the header fields, the transactions that make up the block. Let us now go through the header fields one by one.

class CBlockHeader
    // header
    int32_t nVersion;
    uint256 hashPrevBlock;
    uint256 hashMerkleRoot;
    uint32_t nTime;
    uint32_t nBits;
    uint32_t nNonce;

The first field is the version number, in the usual little endian encoding. Here, the version is simply 1 – not really surprising for the second block that was ever mined. In more recently mined blocks, the version field looks different and is most often 0x20000000 – this is not just a version number any more but a bit mask used to indicate any soft forks that the miner is willing to support, as specified in BIP 009.

The next field is the hash of the previous block. Similar to transactions, this in practice identifies a block uniquely (and is a better identifier than the position in the block chain since it also works for blocks in a fork). Actually, this is not really the hash of the entire block, but of the block header only (GetHash is a method of CBlockHeader).

If we compare the previous blocks hash ID with the JSON output, we find that compared to the JSON output, the order of the bytes is reversed. We have already seen that byte reversal when looking at transactions – the ID of the previous transaction also appears reversed in the serialization compared to the output of a blockchain explorer or the bitcoin-cli tool Time to look at why this happens in a bit more detail.

If you look at primitives/block.h, you will find that the hash of the previous transaction is a uint256. This data type – a 256 bit integer – is defined in uint256.h and is a number. As every number on the x86 platform (on which the bitcoin software was developed), this number is internally stored in little endian byte order, i.e. the least significant bit first (look for instance at the method base_blob::GetUint64 to see this) and is serialized as a little endian byte string (see base_blob::Serialize). However, if it is displayed, it is converted to a big endian representation in the the function base_blob::GetHex(), which simply reverts the order bytewise.

An example where this happens is the JSON RPC server. In blockheaderToJSON in rpc/blockchain.cpp, for instance, we find the line

if (blockindex->pprev)
        result.push_back(Pair("previousblockhash", blockindex->pprev->GetBlockHash().GetHex()));

So we see that GetHex() is called explicitly to convert the little endian internal representation of the hash into the big endian display format. This explains why the order of the hash appears to be reversed in the serialized raw format.

Having clarified this, let us now proceed with our analysis of the raw block data. The next field is called the Merkle root. Essentially, this is a hash value of the transactions which are part of the block, aggregated in a clever way, a so called Merkle tree. We will look at Merkle trees in a separate post, for the time being let us simply accept that this is a hash of the underlying transactions. Again, note that the order of the bytes is reversed between the JSON representation (big endian) and the serialized format (little endian).

The next field is again easy, it is the creation time of the block as announced by the miner who did build the block. As usual in the Unix world, the time is stored as POSIX time, i.e. essentially in seconds since the first of January 1970.

Again, byte order is significant – the hex string b0bc6649 in our case first needs to be reversed and then transformed into an integer:

nTime = nTime = int.from_bytes(bytes.fromhex('b0bc6649'), "little")
import time

This should give you Friday, January 9th 2009 at 03:55:44.

The next field is called the bits – a name which would probably not win a prize for being a unique, meaningful and descriptive variable name. If you search the source code a bit, you will find that in fact, those bits represent the difficulty which controls the mining process – again, this is a matter for a separate post.

The last field in the header is again relevant for the mining process. It is called the nonce and is used as a placeholder that a miner can set to an arbitrary value in order to produce blocks with different hash values (actually, there is a second nonce hidden in the signature script of the coinbase transaction, we get to this point later when we discuss the mining in detail).

This completes the fields of the block header. In addition to these fields, a full block contains a vector holding the transactions that are contained in the block. As every vector, this vector is serialized by first writing out the number of elements – one in this case – followed by the individual transactions, encoded in the usual serialized format.

With this, we have completed our short tour through the layout of a block and are ready to summarize our findings in the following diagram.


In the next few posts, we will dig deeper into some of these fields and learn how Merkle trees are used to validate integrity and how the difficulty and the nonce play together in the mining process.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s