On the road again – serializing and deserializing bitcoin transactions

In this post, I will show you how a bitcoin transaction presented in the raw format is to be interpreted and how conversely a bitcoin transaction stored in a C++ (and later Python) object can be converted into a hexadecimal representation (a process called serialization). Ultimately, the goal of this and subsequent posts will be to create a bitcoin transaction from scratch in Python, to sign it and to publish it in a bitcoin network, without using any of the available bitcoin libraries.

The subject of serialization and deserialization in the bitcoin protocol is a bit tricky. At the end of the day, the truth is hidden in the reference implementation somewhere (so time to get the code from the GitHub repository if you have not done so yet). I have to admit that when I first started to work with that code, I found it not exactly easy to understand, given that it has been a few years (well, somewhere around 20 years to be precise) since I last worked with templates in C++. Still, the idea of this post is to get to the bottom of it, and so I will walk you through the most relevant pieces of the source code. But be warned – this will not be an easy read and a bit lengthy. Alternatively, you can also skip directly to the end where the result is again summarized and ignore the details.

The first thing that we need is access to a raw (serialized) bitcoin transaction. This can be obtained from blockchain.info using the following code snippet.

import requests

def get_raw_transaction(txid="ed70b8c66a4b064cfe992a097b3406fa81ff09641fe55a709e4266167ef47891"):
    url = 'https://blockchain.info/en/tx/' + txid + '?format=hex'
    r = requests.get(url)
    return r.text

If you print the result, you should get


Having that, we can now start to go through this byte by byte – you might even want to print that string and strike out the bytes as we go. To understand how serialization works in the reference implementation, we will have to study the the header file serialize.h containing boilerplate code to support serialization. In addition, each individual data type contains specific serialization code. It is useful to compare our results with the human readable description of the transaction at blockchain.info.

To understand how the mechanism works, let us start at the function getrawtransaction in rpc/rawtransaction.cpp which is implementing the corresponding RPC call. This function ends up calling TxToUniv in core_write.cpp and finally EncodeHexTx in the same file. Here an instance of the class CDataStream is created which is defined in streams.h. For that class, the operator << is overwritten so that the function Serialize is invoked. Templates for this method are declared in serialize.h and will tell us how the individual data types are serialized in each individual case for the elementary data types and sets, vectors etc.. All composite classes need to implement their own Serialize method to fit into this scheme.

For a transaction, the method CTransaction::Serialize is defined in primitives/transaction.h and delegates the call to the function SerializeTransaction in the same file.

inline void SerializeTransaction(const TxType& tx, Stream& s) {
    const bool fAllowWitness = !(s.GetVersion() & SERIALIZE_TRANSACTION_NO_WITNESS);

    s << tx.nVersion;
    unsigned char flags = 0;
    // Consistency check
    if (fAllowWitness) {
        /* Check whether witnesses need to be serialized. */
        if (tx.HasWitness()) {
            flags |= 1;
    if (flags) {
        /* Use extended format in case witnesses are to be serialized. */
        std::vector vinDummy;
        s << vinDummy;
        s << flags;
    s << tx.vin;
    s << tx.vout;
    if (flags & 1) {
        for (size_t i = 0; i < tx.vin.size(); i++) {
            s << tx.vin[i].scriptWitness.stack;
    s << tx.nLockTime;

Throughout this post, we will ignore the extended format that relates to the segregated witness feature and restrict ourselves to the standard format, i.e. to the case that the flag fAllowWitness above is false.

We see that the first four bytes are the version number, which is 2 in this case. Note that little endian encoding is used, i.e. the first byte is the least significant byte. So the version number 2 corresponds to the string


Next, the transaction inputs and transaction outputs are serialized. These are vectors, and the mechanism for serializing vectors becomes apparent in serialize.h.

void Serialize_impl(Stream& os, const std::vector& v, const V&)
    WriteCompactSize(os, v.size());
    for (typename std::vector::const_iterator vi = v.begin(); vi != v.end(); ++vi)
        ::Serialize(os, (*vi));

inline void Serialize(Stream& os, const std::vector& v)
    Serialize_impl(os, v, T());

We see that to serialize a vector, we first serialize the length of the vector, i.e. the number of elements, and then call the serialization method on each of the individual items. The length is serialized in a compact format called a varInt which stores a number in 1 – 9 bytes, depending on its size. In this case, one byte is sufficient – this is the byte 03 after the version number. Thus we can conclude that the transaction has three transaction inputs.

To understand the next bytes, we need to look at the method CTxIn::SerializeOp.

inline void SerializationOp(Stream& s, Operation ser_action) {

This is not very surprising – we see that the spent transaction output, the signature script and the sequence number are serialized in that order. The spent transaction prevout is an instance of COutPoint which has its own serialization method. First, the transaction ID of the previous transaction is serialized according to the method base_blob::Serialize defined in uint256.h. This will produce the hexadecimal representation in little endian encoding, so that we have to reverse the order bytewise to obtain the transaction ID.

So in our example, the ID of the previous transaction is encoded in the part starting with 620f7b… in the first line and ending (a transaction ID has always 256 bit, i.e. 32 bytes, i.e. 64 characters) with the bytes …1c40e5f6 early in the second line. To get the real transaction ID, we have to revert this byte for byte, i.e. the transaction ID is


The next four bytes still belong to the spent transaction and encode the index of the spent output in the list of outputs of the previous transaction. In this case this is 1, again encoded in little endian byte order, i.e. as 01000000. Thus we have now covered and understood the following part of the hex representation.


Going back to the serialization method of the class CTxIn, we now see that the next few bytes are the signature script. The format of the signature script is complicated and will be covered in a separate post. For today, we simply take this as a hexadecimal string. In our case, this string starts with 6a473044 …. in the second line and ends with … 541dafbd close to the end of line five.

Finally, the last two bytes in line five and the first two bytes in line six are the sequence number in little endian byte order.

We are now done with the first transaction input. There are two more transaction inputs that follow the same pattern, the last one ends again with the sequence number close to the end of line 15.

Now we move on to the transaction outputs. Again, as this is a vector, the first byte (02) is the number of outputs. Each output is then serialized according to the respective method of the class TxOut.

inline void SerializationOp(Stream& s, Operation ser_action) {

The first element is the value, which is an instance of the class CAmount. Again, we can look up the serialization method of this class in amount.h and find that this is simply a 64 bit integer, so its serialization method is covered by the templates in serialize.h and results simply in eight bytes in little endian order:


If we reorder and decode this, we obtain 686282 Satoshi, i.e. 0.0686282 bitcoin. The next object that is serialized is the public key script. Again, we leave the details to a later post, but remark that (which is also true for the signature script) the first byte is the length of the remaining part of the script in bytes, so that we can figure out that the script is composed of the 0x19 = 25 bytes


For the second output, the pattern repeats itself. We have the amount and the public key script


of the second output.

Finally, there are four bytes left: 6fce0700. Going back to SerializeTransaction, we identify this as the lock time 0x7ce6f ( 511599 in decimal notation).

After going through all these details, it is time to summarize our findings. A bitcoin transaction is encoded as a hexadecimal string as follows.

  • The version number (4 bytes, little endian)
  • The number of transaction inputs
  • For each transaction input:
    • the ID of the previous transaction (reversed)
    • the index of the spent transaction output in the previous transaction (4 bytes, little endian)
    • the length of the signature script
    • the signature script
    • the sequence number (4 bytes, little endian)
  • The number of transaction outputs
  • For each transaction output:
    • the amount (eight bytes, little endian encoding) in Satoshi
    • the length of the public key script
    • the public key script
  • the locktime (four bytes, little endian)

In my GitHub account, you will find a Python script Transaction.py that retrieves our sample transaction from the blockchain.info site and prints out all the information line by line. To run it, clone the repository using

$ git clone https://github.com/christianb93/bitcoin.git ; cd bitcoin

and then run the script

$ python Transaction.py

The script uses a few modules in the package btc, namely txn.py and serialize.py that essentially implement the serialization and deserialization routines discussed in this post.

That is it for today. In the next posts, I will start to look at a topic that we have more or less consequently ignored or oversimplified so far: scripts in the bitcoin world.

One thought on “On the road again – serializing and deserializing bitcoin transactions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s