Python – Page 6 – LeftAsExercise

Hopfield networks: practice

After having discussed Hopfield networks from a more theoretical point of view, let us now see how we can implement a Hopfield network in Python.

First let us take a look at the data structures. We will store the weights and the state of the units in a class HopfieldNetwork. The weights are stored in a matrix, the states in an array.

class HopfieldNetwork:

    #
    # Initialize a Hopfield network with N
    # neurons
    #
    def __init__(self, N):
        self.N = N
        self.W = np.zeros((N,N))
        self.s = np.zeros((N,1))

Next we write a method for the update rule. In a matrix notation, the activation of unit i is given as the dot product of the current state and the i-th row of the weight matrix (or the i-th column, as the matrix is symmetric). Therefore we can use the following update function.

#
# Run one simulation step
#
def runStep(self):
    i = np.random.randint(0,self.N)
    a = np.matmul(self.W[i,:], self.s)
    if a < 0:
        self.s[i] = -1
    else:
        self.s[i] = 1

Finally, there is the Hebbian learning rule. If we store the sample set in a matrix S such that each row corresponds to one sample, the learning rule is equivalent to

$W = S^T S$

Thus we can again use the matrix multiplication from the numpy package, and our learning rule is simply

def train(self, S):
    self.W = np.matmul(S.transpose(), S)

Now we need some pattern to train our network. For the sake of demonstration, let us use a small network with 5 x 5 units, each of them representing a pixel in a grayscale 5 x 5 image. For this post, I have hardcoded five simple patterns, but we could of course use any other training set.

To illustrate how the Hopfield network operates, we can now use the method train to train the network on a few of these patterns that we call memories. We then take these memories and randomly flip a few bits in each of them, in other words we simulate random errors in the pattern. We then place the network in these states and run the update rule several times. If you want to try this yourself, get the script Hopfield.py from my GitHub repository.

The image above shows the result of this exercise. Here we have stored three memories – the first column of images – in the network. The second image in each row then shows the distorted versions of these patterns after flipping five bits randomly. The next images in each row show the state of the network after 20, 40, 60, 80 and 100 iterations of the update rule. We see that in this case, all errors could be removed and the network did in fact converge to the original image.

However, the situation becomes worse if we try to store more memories. The next image shows the outcome of a simulation with the same basic parameters, but five instead of three memories.

We clearly see that not a single one of the distorted patterns converges to the original image. Apparently, we have exceeded the capacity of the network. In his paper, Hopfield – based on theoretical considerations and simulations – argues that the network can only store approximately $0.15 \cdot N$ patterns, where N is the number of units. In our case, with 25 units, this would be approximately 3 to 4 patterns.

If we exceed this limit, we seem to create local minima of the energy that do not correspond to any of the stored patterns. This problem is known as the problem of spurious minima which can also occur if we stay below the maximum capacity – if you do several runs with three memories, you will find that also in this case, spurious minima can occur.

The update rule of the Hopfield network is deterministic, its energy can never increase. Thus if the system moves into one of those local minima, it can never escape again and gets stuck. An Ising model at a finite, non-zero temperature behaves differently. As the update rule is stochastic, it is possible that the system moves away from a local minimum in a Gibbs sampling step, and therefore has the chance to escape from a spurious minimum. This is one of the reasons why researchers have tried to come up with stochastic versions of the Hopfield network. One of these stochastic versions is the Boltzmann network, and we will start to look at its theoretical foundations in the next post in this series.

Hopfield networks: theory

Having looked in some detail at the Ising model, we are now well equipped to tackle a class of neuronal networks that has been studied by several authors in the sixties, seventies and early eighties of the last century, but has become popular by an article [1] published by J. Hopfield in 1982.

The idea behind this and earlier research is as follows. Motivated by the analogy between a unit in a neuronal network and a neuron in a human brain, researchers were trying to understand how the neurons needed to be organized to be able to create abilities like associative memories, i.e. a memory that can be navigated by associations that bring up additional stored memories. To explain how the human brain organizes the connections between the neurons in optimal (well, as least useful) way, analogies with physical systems like the Ising model covered in this post which also exhibit some sort of spontaneous self organization, were pursued.

In particular, the analogy with stability attracted attention. In many physical systems, there are stable states. If the system is put into a state which is sufficiently close to such a stable state, it will, over time, move back into that stable state. A similar property is desirable for associative memory systems. If, for instance, such a system has memorized an image and is then placed in a state which is somehow close to that image, i.e. only a part of the image or a noisy version of the image is presented, it should converge into the memorized state representing the original image.

With that motivation, Hopfield described the following model of a neuronal network. Our network consists of individual units that can be in any of two states, “firing” and “not firing”. The system consists of N such units, we will denote the state of unit i by $s_i$ .

Any two units can be connected, and there is a matrix W whose elements represent the strength of the connection between the individual units, i.e. $w_{ij}$ is the strength of the connection between the units i and j. We assume that no neuron is connected to ifself, i.e. that $w_{ii} = 0$ , and that the matrix of weights is symmetric, i.e. that $w_{ij} = w_{ji}$ .

The activation of unit i is then obtained by summing up the weighted values of all neurons connected to it, i.e. given by

$a_i = \sum_j w_{ij} s_j$

Hopfield used a slightly different notation in his paper and assigned the values 0 and 1 to the two states, but we will again use -1 and +1.

So how does the Hopfield network operate? Suppose that the network is in a certain state. i.e. some of the neurons will be “firing”, represented by the value +1, and others will be passive, represented by the value -1. We now choose a neuron at random and calculate its activation function according to the formula above. We then determine the new state by the rule

$s_i' = \begin{cases} +1 & a_i \geq 0 \\ -1 & a_i < 0 \end{cases}$

In most cases, the network will actually converge after a finite number of steps, i.e. this rule does not change the state any more. To see why this happens, let us consider the function

$E(s) = - \frac{1}{2} \sum_{i,j} w_{ij} s_i s_j$

which is called the energy function of the model. Suppose that we pass from a state s to a state s’ by applying the update rule above. Let us assume that we have updated neuron i and changed its state from $s_i$ to $s_i'$ . Let

$\Delta s_i = s_i' - s_i$

Using the fact that the matrix W is symmetric, we can then write

$E(s') = -\frac{1}{2} \sum_{p,q \neq i} s_p s_q - \sum_p w_{pi} s_i' s_p$

which is the same as

$-\frac{1}{2} \sum_{p,q \neq i} s_p s_q - \sum_p w_{pi} s_i s_p - \sum_p w_{pi} (s_i' - s_i) s_p$

Thus we find that

$E(s') = E(s) - \Delta s_i \sum_p w_{ip} s_p$

Now the sum is simply the activation of neuron i. As our update rule guarantees that the product of $\Delta s_i$ and the activation of unit i is never negative, this implies that during the upgrade process, the energy function will always increase or stay the same. Thus the state will settle in a local minimum of the energy function.

At this point, we can already see some interesting analogies with the Ising model. Clearly, the units in a Hopfield network correspond to the particles in an Ising model. The state (firing or not) corresponds to the spin (upward or downward). The energy is almost literally the same as the energy of the Ising model without an external magnetic field.

Also the update rules are related. Recall that during a Gibbs sampling step for an Ising model, we calculate the conditional probability

$P = \sigma(2 \beta \langle J_i, s \rangle)$

Here the scalar product is the equivalent of the activation, and we could rewrite this as

$P = \sigma(2 \beta a_i)$

Let us now assume that the temperature is very small, so that $\beta$ is close to infinity. If the activation of unit i is positive, the probability will be very close to one. The Gibbs sampling rule will then almost certainly set the spin to +1. If the activation is negative, the probability will be zero, and we will set the spin to -1. Thus the update role of a Hopfield network corresponds to the Gibbs sampling step for an Ising model at temperature zero.

At nonzero temperatures, a Hopfield network and an Ising model start to behave differently. The Boltzmann distribution guarantees that the state with the lowest energies are most likely, but as the sampling process proceeds, the random element built into the Gibbs sampling rule implies that a state can evolve into another of higher energy as well, even though this is unlikely. For the Hopfield network, the update rule is completely deterministic, and the states will always evolve into states of lower or at least equal energy.

The memories that we are looking for are now the states of minimum energy. If we place the system in a nearby state and let it evolve according to the update rules, it will move over time back into a minimum and thus “remember” the original state.

This is nice, but how do we train a Hopfield network? Given some state s, we want to construct a weight matrix such that s is a local minimum. More generally, if we have already defined weights giving some local minima, we want to adjust the weights in order to create an additional minimum at s, if possible without changing the already existing minima significantly.

In Hopfields paper, this is done with the following learning rule.

$w_{ij} = \begin{cases} \sum_s S^{(s)}_i S^{(s)}_j & i \neq j \\ 0 & i = j \end{cases}$

where $S^{(1)}, \dots, S^{(K)}$ are the states that the network should remember (in a later post in this series, we will see that this rule can be obtained as the low temperature limit of a training algorithm called contrastive divergence that is used to train a certain class of Boltzmann machines).

Thus a state S contributes with a positive value to $w_{ij}$ if $S_i$ and $S_j$ have the same sign, i.e. are in the same state. This corresponds to a rule known as Hebbian learning rule that has been postulated as a principle of learning by D. Hebb and basically states that during learning, connections between neurons are enforced if these neurons fire together ([2], chapter 4).

Let us summarize what we have done so far. We have described a Hopfield network as a fully connected binary neuronal network with symmetric weight matrices and have defined the update rule and the learning rule for these networks. We have seen that the dynamics of the network resembles that of an Ising model at low temperatures. We now expect that a randomly chosen initial state will converge to one of the memorized states and that therefore, this model can serve as an associative memory.

In the next post, we will put this to work and implement and train a Hopfield network in Python to study its actual behavior.

References

1. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci. Vol. 79, No. 8 (1982), pp. 2554-2558
2. D.O. Hebb, The organization of behaviour, Wiley, New York 1949

On the road again – serializing and deserializing bitcoin transactions

In this post, I will show you how a bitcoin transaction presented in the raw format is to be interpreted and how conversely a bitcoin transaction stored in a C++ (and later Python) object can be converted into a hexadecimal representation (a process called serialization). Ultimately, the goal of this and subsequent posts will be to create a bitcoin transaction from scratch in Python, to sign it and to publish it in a bitcoin network, without using any of the available bitcoin libraries.

The subject of serialization and deserialization in the bitcoin protocol is a bit tricky. At the end of the day, the truth is hidden in the reference implementation somewhere (so time to get the code from the GitHub repository if you have not done so yet). I have to admit that when I first started to work with that code, I found it not exactly easy to understand, given that it has been a few years (well, somewhere around 20 years to be precise) since I last worked with templates in C++. Still, the idea of this post is to get to the bottom of it, and so I will walk you through the most relevant pieces of the source code. But be warned – this will not be an easy read and a bit lengthy. Alternatively, you can also skip directly to the end where the result is again summarized and ignore the details.

The first thing that we need is access to a raw (serialized) bitcoin transaction. This can be obtained from blockchain.info using the following code snippet.

import requests

def get_raw_transaction(txid="ed70b8c66a4b064cfe992a097b3406fa81ff09641fe55a709e4266167ef47891"):
    url = 'https://blockchain.info/en/tx/' + txid + '?format=hex'
    r = requests.get(url)
    return r.text

If you print the result, you should get

0200000003620f7bc1087b0111f76978ef747001e3ae0a12f254cbfb858f
028f891c40e5f6010000006a47304402207f5dfc2f7f7329b7cc731df605
c83aa6f48ec2218495324bb4ab43376f313b840220020c769655e4bfcc54
e55104f6adc723867d9d819266d27e755e098f646f689d0121038c2d1cbe
4d731c69e67d16c52682e01cb70b046ead63e90bf793f52f541dafbdfeff
fffff15fe7d9e0815853738ce47deadee69339e027a1dfcfb6fa887cce3a
72626e7b010000006a47304402203202e6c640c063989623fc782ac1c9dc
3c6fcaed996d852ec876749ba63db63b02207ef86e262ad4b4bc9cebfadb
609f52c35b0105e15d58a5ecbecc5e536d3a8cd8012103dc526ca188418a
b128d998bf80942d66f1b3be585d0c89bd61c533bddbdaa729feffffff84
e6431db86833897bab333d844486c183dd01e69862edea442e480c2d8cb5
49010000006a47304402200320bc83f35ceab4a7ef0f8181eedb5f54e3f6
17626826cc49c8c86efc9be0b302203705889d6aed50f716b81b0f3f5769
d72d1b8a6b59d1b0b73bcf94245c283b8001210263591c21ce8ee0d96a61
7108d7c278e2e715ac6d8afd3fcd158bee472c590068feffffff02ca780a
00000000001976a914811fb695e46e2386501bcd70e5c869fe6c0bb33988
ac10f59600000000001976a9140f2408a811f6d24ab1833924d98d884c44
ecee8888ac6fce0700

Having that, we can now start to go through this byte by byte – you might even want to print that string and strike out the bytes as we go. To understand how serialization works in the reference implementation, we will have to study the the header file serialize.h containing boilerplate code to support serialization. In addition, each individual data type contains specific serialization code. It is useful to compare our results with the human readable description of the transaction at blockchain.info.

To understand how the mechanism works, let us start at the function getrawtransaction in rpc/rawtransaction.cpp which is implementing the corresponding RPC call. This function ends up calling TxToUniv in core_write.cpp and finally EncodeHexTx in the same file. Here an instance of the class CDataStream is created which is defined in streams.h. For that class, the operator << is overwritten so that the function Serialize is invoked. Templates for this method are declared in serialize.h and will tell us how the individual data types are serialized in each individual case for the elementary data types and sets, vectors etc.. All composite classes need to implement their own Serialize method to fit into this scheme.

For a transaction, the method CTransaction::Serialize is defined in primitives/transaction.h and delegates the call to the function SerializeTransaction in the same file.

template
inline void SerializeTransaction(const TxType& tx, Stream& s) {
    const bool fAllowWitness = !(s.GetVersion() & SERIALIZE_TRANSACTION_NO_WITNESS);

    s << tx.nVersion;
    unsigned char flags = 0;
    // Consistency check
    if (fAllowWitness) {
        /* Check whether witnesses need to be serialized. */
        if (tx.HasWitness()) {
            flags |= 1;
        }
    }
    if (flags) {
        /* Use extended format in case witnesses are to be serialized. */
        std::vector vinDummy;
        s << vinDummy;
        s << flags;
    }
    s << tx.vin;
    s << tx.vout;
    if (flags & 1) {
        for (size_t i = 0; i < tx.vin.size(); i++) {
            s << tx.vin[i].scriptWitness.stack;
        }
    }
    s << tx.nLockTime;
}

Throughout this post, we will ignore the extended format that relates to the segregated witness feature and restrict ourselves to the standard format, i.e. to the case that the flag fAllowWitness above is false.

We see that the first four bytes are the version number, which is 2 in this case. Note that little endian encoding is used, i.e. the first byte is the least significant byte. So the version number 2 corresponds to the string

02000000

Next, the transaction inputs and transaction outputs are serialized. These are vectors, and the mechanism for serializing vectors becomes apparent in serialize.h.

template
void Serialize_impl(Stream& os, const std::vector& v, const V&)
{
    WriteCompactSize(os, v.size());
    for (typename std::vector::const_iterator vi = v.begin(); vi != v.end(); ++vi)
        ::Serialize(os, (*vi));
}

template
inline void Serialize(Stream& os, const std::vector& v)
{
    Serialize_impl(os, v, T());
}

We see that to serialize a vector, we first serialize the length of the vector, i.e. the number of elements, and then call the serialization method on each of the individual items. The length is serialized in a compact format called a varInt which stores a number in 1 – 9 bytes, depending on its size. In this case, one byte is sufficient – this is the byte 03 after the version number. Thus we can conclude that the transaction has three transaction inputs.

To understand the next bytes, we need to look at the method CTxIn::SerializeOp.

template
inline void SerializationOp(Stream& s, Operation ser_action) {
    READWRITE(prevout);
    READWRITE(scriptSig);
    READWRITE(nSequence);
}

This is not very surprising – we see that the spent transaction output, the signature script and the sequence number are serialized in that order. The spent transaction prevout is an instance of COutPoint which has its own serialization method. First, the transaction ID of the previous transaction is serialized according to the method base_blob::Serialize defined in uint256.h. This will produce the hexadecimal representation in little endian encoding, so that we have to reverse the order bytewise to obtain the transaction ID.

So in our example, the ID of the previous transaction is encoded in the part starting with 620f7b… in the first line and ending (a transaction ID has always 256 bit, i.e. 32 bytes, i.e. 64 characters) with the bytes …1c40e5f6 early in the second line. To get the real transaction ID, we have to revert this byte for byte, i.e. the transaction ID is

f6e5401c898f028f85fbcb54f2120aaee3017074ef7869f711017b08c17b0f62

The next four bytes still belong to the spent transaction and encode the index of the spent output in the list of outputs of the previous transaction. In this case this is 1, again encoded in little endian byte order, i.e. as 01000000. Thus we have now covered and understood the following part of the hex representation.

0200000003620f7bc1087b0111f76978ef747001e3ae0a12f254cbfb858f
028f891c40e5f601000000

Going back to the serialization method of the class CTxIn, we now see that the next few bytes are the signature script. The format of the signature script is complicated and will be covered in a separate post. For today, we simply take this as a hexadecimal string. In our case, this string starts with 6a473044 …. in the second line and ends with … 541dafbd close to the end of line five.

Finally, the last two bytes in line five and the first two bytes in line six are the sequence number in little endian byte order.

We are now done with the first transaction input. There are two more transaction inputs that follow the same pattern, the last one ends again with the sequence number close to the end of line 15.

Now we move on to the transaction outputs. Again, as this is a vector, the first byte (02) is the number of outputs. Each output is then serialized according to the respective method of the class TxOut.

template
inline void SerializationOp(Stream& s, Operation ser_action) {
    READWRITE(nValue);
    READWRITE(scriptPubKey);
}

The first element is the value, which is an instance of the class CAmount. Again, we can look up the serialization method of this class in amount.h and find that this is simply a 64 bit integer, so its serialization method is covered by the templates in serialize.h and results simply in eight bytes in little endian order:

ca780a0000000000

If we reorder and decode this, we obtain 686282 Satoshi, i.e. 0.0686282 bitcoin. The next object that is serialized is the public key script. Again, we leave the details to a later post, but remark that (which is also true for the signature script) the first byte is the length of the remaining part of the script in bytes, so that we can figure out that the script is composed of the 0x19 = 25 bytes

76a914811fb695e46e2386501bcd70e5c869fe6c0bb33988ac

For the second output, the pattern repeats itself. We have the amount and the public key script

76a9140f2408a811f6d24ab1833924d98d884c44ecee8888ac

of the second output.

Finally, there are four bytes left: 6fce0700. Going back to SerializeTransaction, we identify this as the lock time 0x7ce6f ( 511599 in decimal notation).

After going through all these details, it is time to summarize our findings. A bitcoin transaction is encoded as a hexadecimal string as follows.

The version number (4 bytes, little endian)
The number of transaction inputs
For each transaction input:
- the ID of the previous transaction (reversed)
- the index of the spent transaction output in the previous transaction (4 bytes, little endian)
- the length of the signature script
- the signature script
- the sequence number (4 bytes, little endian)
The number of transaction outputs
For each transaction output:
- the amount (eight bytes, little endian encoding) in Satoshi
- the length of the public key script
- the public key script
the locktime (four bytes, little endian)

In my GitHub account, you will find a Python script Transaction.py that retrieves our sample transaction from the blockchain.info site and prints out all the information line by line. To run it, clone the repository using

$ git clone https://github.com/christianb93/bitcoin.git ; cd bitcoin

and then run the script

$ python Transaction.py

The script uses a few modules in the package btc, namely txn.py and serialize.py that essentially implement the serialization and deserialization routines discussed in this post.

That is it for today. In the next posts, I will start to look at a topic that we have more or less consequently ignored or oversimplified so far: scripts in the bitcoin world.

Transactions in the bitcoin network

In my previous posts on the bitcoin protocol, I have described those objects that constitute participants – private and public keys and bitcoin addresses. Now we will look at those objects that represent actual transfers of bitcoins between these participants, namely at transactions.

Essentially, a bitcoin transaction consists of two parts. First, a transaction contains a list of one or more transaction outputs. A transaction output describes the target of the bitcoin transfer and basically consists of a recipient, usually identified by the bitcoin address, and an amount that this recipient is supposed to receive. A transaction can have one or more outputs and thus pay bitcoins to different recipients as part of one transaction (in fact, this is typically the case due to the need for change as we will see later).

There are different types of transaction outputs. The most common one is called a Pay to Public Key Hash (P2PKH) transaction output and refers to a recipient by a hash of the public key of that recipient, which is essentially the same as the bitcoin address. Only the owner of that public key, i.e. technically speaking whoever has access to the private key, can spend these transaction outputs – we will see later how the signing process in the blockchain takes care of this.

Similar to outputs, that determine the target of the payment, there are transaction inputs, that determine the source of the payment. Each transaction input refers to an earlier transaction output. Transaction outputs that are also consumed by some transaction input are called spent, outputs that are not yet referenced by any transaction input are called unspent transaction outputs, abbreviated UTXO.

Let us look at an example to illustrate this. Suppose Alice wants to transfer 1.0 BTC to Bob. She (respectively her wallet software) will first scan all the unspent transaction outputs available to Alice, i.e. all transaction outputs that refer to public keys for which Alice has the private key. In the example displayed above, her wallet might locate two transactions (the transactions on the left of the picture) that together contain three transaction outputs with amounts 0.3, 0.5 and 0.4 (the outputs not circled in red).

These outputs sum up to 1.2 BTC and are therefore sufficient to fund the payment to Bob. Alice will therefore construct a transaction – displayed in the middle of the picture above – that has three inputs, each of them referring to the selected transaction outputs. She could now try to add only one output to the transaction and put Bobs public key hash into this output, but this would mean that she transfers 1.2 BTC to Bob, not only 1.0 BTC. Therefore Alice will add a second transaction output to her transaction, that transfers a certain amount called the change back to herself. Many wallets use dedicated addresses for this that are called change addresses.

In the example above, the change – represented by the second transaction output – is 0.1 BTC. Thus the total inputs of the transaction sum up to 1.2 BTC, the total outputs sum up to 1.1 BTC. The difference is called the fee and will be credited to the miner who creates the block in which the transaction will be included – we will look at the process in mining in a separate post.

We now see that there is no such thing as a “bitcoin balance” stored somewhere in the blockchain. There are only transactions, and ownerhip of bitcoin is equivalent to owning the private key that matches unspent transaction outputs. Transactions form a chain, where each transaction is linked to previous transactions via the transaction inputs, and this chain represents the flow of bitcoin ownership.

You might ask yourself whether this is not a “chicken and egg” problem. If each transaction can only spend bitcoins that are present in previously unspent transaction outputs, where do all the bitcoins initially come from? The answer is again provided by the process of mining, where special bitcoin transactions called coinbase transactions are generated that only have an output (and a dummy input) so that bitcoin supply is created.

With this preparation, let us now take a look at the source code of the reference implementation. The transaction data structure is defined in primitives/transaction.h. After removing some comments and constants, the relevant part of the code is

class CTransaction
{
public:
    const int32_t nVersion;
    const std::vector vin;
    const std::vector vout;
    const uint32_t nLockTime;

We see that in addition to the attributes that we expect – a vector of transaction inputs and a vector of transaction outputs – a transaction has two additional attributes. The first attribute is the version number. The current version number is 2 (CURRENT_VERSION in the header file), but you will also find older transactions with version number one. The second additional field is the locktime, which can be used to define an earliest time (or block) at which the transaction can be added to the chain.

In the same header file, the transaction input and the transaction output structure are defined. We start with the transaction output.

class CTxOut
{
public:
CAmount nValue;
CScript scriptPubKey;

We see that a transaction output consists of a value that represents the amount that the transaction transfers, and a field called scriptPubKey which contains essentially the hash of the public key of the recipient – we will look at scripts in the bitcoin protocol in more detail in a later post. The definition of CAmount is located in amount.h:

/** Amount in satoshis (Can be negative) */
typedef int64_t CAmount;

static const CAmount COIN = 100000000;
static const CAmount CENT = 1000000;

The amount is thus specified in a unit called Satoshi which is therefore the smallest unit of bitcoin that can be transferred. The constant COIN is the number of Satoshi that comprises one bitcoin and is $10^8$ .

The definition of the transaction input is similar. Ignoring a feature called segregated witness, the relevant part is

class CTxIn
{
public:
COutPoint prevout;
CScript scriptSig;
uint32_t nSequence;

Here COutPoint is a class that refers to a previous transaction output – the spent transaction output – as expected from the picture above, and which consists of the ID (i.e. hash value) of the transaction that contains the previous output as well as the index of the output in the vector of all transaction outputs of this transaction. The attribute scriptSig contains roughly speaking a signature of the entire transaction that has been produced with the private key that belongs to the public key referenced in the spent transaction output. Finally, the field nSequence is called the sequence number and can be used in combination with the locktime – again we will not get into details on this in this post yet.

The image above summarizes what we have learned so far about the structure of a bitcoin transaction. Time to take a look at a real transaction. The page blockchain.info offers an API to retrieve transactions from the bitcoin blockchain. So let us open a terminal and enter

$ curl https://blockchain.info/en/tx/ed70b8c66a4b064cfe992a097b3406fa81ff09641fe55a709e4266167ef47891?format=hex 0200000003620f7bc1087b0111f76978ef747001e3ae0a12f254cbfb858f028f891c40e5f6010000006a47304402207f5dfc2f7f7329b7cc731df605c83aa6f48ec2218495324bb4ab43376f313b840220020c769655e4bfcc54e55104f6adc723867d9d819266d27e755e098f646f689d0121038c2d1cbe4d731c69e67d16c52682e01cb70b046ead63e90bf793f52f541dafbdfefffffff15fe7d9e0815853738ce47deadee69339e027a1dfcfb6fa887cce3a72626e7b010000006a47304402203202e6c640c063989623fc782ac1c9dc3c6fcaed996d852ec876749ba63db63b02207ef86e262ad4b4bc9cebfadb609f52c35b0105e15d58a5ecbecc5e536d3a8cd8012103dc526ca188418ab128d998bf80942d66f1b3be585d0c89bd61c533bddbdaa729feffffff84e6431db86833897bab333d844486c183dd01e69862edea442e480c2d8cb549010000006a47304402200320bc83f35ceab4a7ef0f8181eedb5f54e3f617626826cc49c8c86efc9be0b302203705889d6aed50f716b81b0f3f5769d72d1b8a6b59d1b0b73bcf94245c283b8001210263591c21ce8ee0d96a617108d7c278e2e715ac6d8afd3fcd158bee472c590068feffffff02ca780a00000000001976a914811fb695e46e2386501bcd70e5c869fe6c0bb33988ac10f59600000000001976a9140f2408a811f6d24ab1833924d98d884c44ecee8888ac6fce0700

The curl command (you could also open the URL in a browser) retrieves a transaction in raw hexadecimal format, specifying the transaction ID as an input (the transaction ID is essentially a hash of the transaction, more on this later). The output is a hexadecimal string containing the requested transaction. This is called a serialized version of the transaction and is important for a number of reasons. First, this is the format that goes over the wire if two nodes in the bitcoin network exchange the transaction. Second, when a transaction is signed, included in a block or when its transaction ID is formed, this serialized representation is the basis. In the next post in this series, we will learn how to encode this representation to match it with the structure described above.

Until then, you might want to play around with the transaction browser at bitcoin.info – just remove the ?format=hex from the URL above and you will get a human readable version of the transaction which will allow you to locate some of the elements (inputs, outputs and amounts) discussed in this post.

Boltzmann machines, spin, Markov chains and all that

The image above displays a set of handwritten digits on the left. They look a bit like being sketched on paper by someone in a hurry and then scanned and digitalized, not very accurate but still mostly readable – but they are artificial, produced by a neuronal network, more precisely a so called restricted Boltzmann machine.

On the right hand side, you see the (core part of) the code that has been used to produce this image. These are about forty lines of code, and there is some code around it which is not shown, but stripping off all the comments and boilerplate code, we could probably fit the algorithm into less than fifty lines of code.

I found this contrast always fascinating. Creating something that resembles handwritten digits sounds incredibly complex, but can be done with an algorithm that can be quenched into a comparatively short program – how does that work? So I started to dig deeper, trying to understand neuronal networks in general and in particular Boltzmann machines – the mathematical foundations, the algorithm and the implementation.

Boltzmann machines are a rather special class of neuronal networks that have a reputation of being hard to train, but are important from a theoretical point of view, being closely related to seemingly remote fields like thermodynamics, statistical mechanics and stochastical processes. That makes them interesting, but also hard to understand. I embarked on that journey a couple of months ago, and I thought it would be nice to create a series of blog posts on this. My current thinking is to have one or two posts on each of the following topics over the next couple of weeks.

Overview – what are Boltzmann machines (this post)
Background from statistical mechanics and the Boltzmann distribution
The Ising model and Gibbs sampling
Hopfield networks: theory
Hopfield networks: practice
From Hopfield networks to Boltzmann machines
Restricted Boltzmann machines
Contrastive divergence and PCD
Implementation with TensorFlow
Basics of Markov chains
Finite Markov chains and recurrent Markov chains
The Metropolis Hasting algorithm

This is a lot of content, and please do not expect to see one post every other day. But maybe I should just start and we will see how it goes….

So let me try to roughly sketch what a Boltzmann machine is. First, it is a neuronal network. As such it is composed of units that can be compared to the neurons in the nervous system. Similar to a neuron, each unit has an input and an output. The output of a neuron can serve as input for another neuron. In general, every neuron receives inputs from many other neurons and delivers outputs to many other neurons.

The diagram above shows a very simple neuronal network. It consists of four neurons. The three of them on the left serve as input to the network. Think of them as the equivalent of cells in – say – your visual cortex that are activated by some external stimulus. The cell on the right is the output of the network. Its activation is computed based on the outputs of the neurons on the left and some parameters of the network called the weights which model the strength of the connection between the neurons and which we denote by $w_i, i = 0, 1, 2$ , as follows.

$o = \sigma(w_0 x_0 + w_1 x_1 + w_2 x_2) = \sigma(w^T X)$

Here $\sigma$ is a function called the activation function and $X$ is the vector that is formed by $x_0, x_1$ and $x_2$ . There are a few standard choices for the activation function, a common one being the sigmoid function.

How is such a network applied? Let us look at this for a problem which is the “Hello World” of machine learning – recognition of handwritten digits. In that problem, you start with a collection of digitized images of handwritten digits, like those that are known as the MNIST database. These images have 28×28 = 784 pixels. We want to design a neuronal network that can classify these images. Such a network should consequently have 784 input units and 10 output units (bear with me that I did not produce a picture of that network, even though it would probably be fun to do this with Neo4J). We present an image to the network by setting the value of the input unit i to the intensity of pixel i. Our aim is to adjust the weights in such a way that if the image represents digit n, only output n is significantly different from zero. This will allow us to classify unknown images – we simply present the image to the network and then see which output is activated.

But how do we find the correct values for the weights? We need weights that connect each of the ten output units to each of the 784 input units, i.e. we have 7840 weights. Thus we are looking for a point in a 7840 dimensional vector space – not easy. Here the process of training comes into play. We take our set of sample images and present them to the network, with initially randomly chosen weights. The output will then not be what we want, but differ from the target output by an error. That error can be expressed as a function of the weights. The task is then to find a minimum of the error function, and there are ways to do this, most notably the procedure which is known as gradient descent.

As the name suggests, we need the gradient of the error function for that purpose. Fortunately, for the type of neuronal network that we have sketched so far, the gradient can be computed fairly easily – in fact the activation function is chosen on purpose to make dealing with the derivatives easy. We end up with a comparatively simple training algorithm for this type of network and maybe I will show how a simple implementation in Python in a later post – but for now let us move on to Boltzmann machines.

Boltzmann machines or more precisely restricted Boltzmann machines (RBMs) are also composed of units and weights, but work a bit differently. There are inputs, which are usually called visible units. But there are no classical outputs. Instead, there is a layer of units that is connected to the visible units and is called hidden units, as shown in the following picture.

In the example of handwritten digits again, you would again have 784 visible units. However, the hidden units would not obviously correlate with the digit represented by the input. Instead, you would have a more or less arbitrary number of hidden units, say 300.

During training, you present the network one image, again by setting the values of the visible units (the input) to the pixel intensities. You then compute the value of the hidden units – but you do not do this deterministically, but bring in some random element. Roughly speaking, if the combined input to a hidden unit is p, you set the value of the unit to one with probability p. Then this process is repeated, this time starting from the hidden units. This gives you certain values for the visible units. You then compare this value to the original value and try to adjust the weights such that you get as closely as possible to the original input (this is not exactly true and a not very precise description of one of the possible learning algorithms called contrastive divergence, but we will get more precise later on).

If you succeed, you will be able to reconstruct the value of the visible units from the values of the hidden units. But there are less hidden units, so that the network has apparently learned a condensed representation of the input that still captures the essence of the input. In the case of digits, you can visualize the state of the network after training and obtain something like

These are visualizations of the weights connected to some of the hidden units of a Bolzmann machine after the training phase. We can see that some of the units have appearantly “learned” some characteristic features of the digits, like the vertical stroke that appears in the digits one and seven. These features can be used for several purposes.

First, we can use the features as input for other neuronal networks. We now have only 300 inputs instead of originally 784, and that might simplify the problem a bit and make the process of training the next network easier.

Or, we could start with some random values for the hidden units and calculate the resulting values of the visible units to create sample images – and in fact this is basically what I have done to produce the samples at the top of this post (using an algorithm called PCD, but we will get to this).

Note that Boltzmann machines differ significantly from the type of neural network that we have considered earlier. One major difference is that to train a Boltzmann machine, you do not need to know the digit that the image represents. At no point have we used the information that the images in our database represent ten different digits – that information was not used in the design of the network nor during the training. This approach to machine learning is called unsupervised learning and is obviously very versatile – you do not have to tell the machine what the structure of the data is, it will detect the features independently.

Second, a Boltzmann machine can not only classify images, but can also create images that resemble a given set of input data. That can be used to reconstruct partially available images or to create new images from scratch – networks with this ability are called generative networks.

The price we have to pay for this is that Boltzmann machines are hard to train. The point is that we can still define some sort of error function, but we cannot easily calculate its gradient – there is no straightforward analytic expression for it that could be evaluated within a reasonable amount of time (if you know statistical physics a bit, that might remind you of the partition function, and that is more than pure coincidence, as I will show you in a later post). So we need to approximate the gradient. Technically, the gradient is an integral

$\int f(x) dx$

for some known, but complicated function $f$ . Even if we cannot find an analytic expression for this, we can try to approximate it. Your first idea might be “Riemann sums”, but it turns out that this is not a good idea, as our function lives in the space of all weights which has a very high dimension. Instead, we will use an approach called Monte Carlo integration where we represent the integral as an expectation value, draw a sample and approximate the expectation value by the sample average. This is where stochastical methods like Markov chains will come into play. And finally we will see that the behaviour of our network during training has some striking analogies with the behaviour of certain physical systems like solids exposed to a magnetic field at low temperatures, which are described by a model called the Ising model, and learn how techniques that physicists have developed for this type of problems apply to neuronal networks.

That is it for today – I hope I could give you a rough idea of what is ahead of us. At least I hope that you start to be curious how all this works out – so looking forward to the next post where I will start with some background from physics.

Keys in the bitcoin network: the public key

In my last post, we have looked in some detail at the private key – how it is generated and how it can be decoded and stored. Let us now do the same with the public key.

Recall that a public key is simply a point on the elliptic curve SECP256K1 that is used by the underlying ECDSA algorithm – in fact it is obtained by multiplying the generator point on the curve by our private key. As any point on the curve, it therefore has an x-coordinate and a y-coordinate, both being 32 bytes unsigned integers. So one way to encode the public key would be as follows.

take the x-coordinate as a point, represented by an integer smaller than p
convert this into a 32 byte hexadecimal string, using for instance big endian encoding
do the same for the y-coordinate
and concatenate these two strings to obtain a single 64 byte hexadecimal string

This encoding is simple, but it has a drawback. Remember that we encode not just a random pair of integers, but a point on the curve, so the x-coordinate and y-coordinate are related by the curve equation

$y^2 = x^3 + ax + b$

Thus given x, we almost know y – we know the square of y modulo p, and there can be at most two different roots of this equation. So we could reconstruct y if we have x and an additional bit that tells us which of the two solutions we need.

Let us now assume that p is odd. If y is a solution of the equation for a given value of x, then p – y (which is -y modulo p) is the second solution. As p is odd, exactly one of the two numbers y and p – y is even. We can therefore use an additional bit that is equal to y modulo 2 to distinguish the two solutions. It is convention to store this bit in a full additional byte, using the value 2 if y is even and the value 3 if y is odd, so that we obtain a representation of the public key (and in fact any other point on the curve) in at most 33 bytes: at most 32 bytes for the value of the x-coordinate and the additional byte containing the value of y modulo 2. This representation is called the compressed representation (see for instance the publication of the SECG, section 2.3).

If there is a compressed representation, you might expect that there is also an uncompressed representation. This is simply the representation that we have described above, i.e. storing both x and y, with an additional twist: to be able to distinguish this from a compressed representation that always starts with 0x02 or 0x03, a leading byte with value 0x04 is added so that the total length of an uncompressed representation is at most 65 bytes. Since version 0.6.0, the bitcoin reference implementation defaults to using compressed keys (see the function CWallet::GenerateNewKey).

Let us summarize what we have learned so far in a short Python code snippet that will take a private key (stored as integer in the variable d), calculate the corresponding point on the elliptic curve SECP256K1 using the ECDSA library and create a compressed representation of the result.

#
# Determine the public key from the
# secret d
#
import ecdsa
curve = ecdsa.curves.SECP256k1
Q = d * curve.generator
#
# and assemble the compressed representation
#
x = Q.x()
y = Q.y()
pubKey = x.to_bytes(length=32, byteorder="big")
pubKey = binascii.hexlify(pubKey).decode('ascii')
if 1 == (y % 2):
    pubKey = "03" + pubKey
else:
    pubKey = "02" + pubKey
print("Compressed key:  ", pubKey)

This way of encoding a public key is in fact not specific to the bitcoin network, but a standard that is used whenever a point on an elliptic curve needs to be encoded – see for instance RFC5480 by the IETF which is part of the X.509 standard for certificates.

However, this is still a bit confusing. If you known the nuts and bolts of the bitcoin protocol a bit, you will have seen that participants publish something that is called an address which is a string similar to

mx5zVKcjohqsu4G8KJ83esVxN52XiMvGTY

That does not look at all like a compressed or uncompressed public key. We are missing something.

The answer is that an address is in fact not a public key, but it is derived from a public key. More precisely, it is an encoded version of a hash value of the public key. So given the address, it is easy to verify that this address belongs to a certain public key, but it is very hard to reconstruct the public key given the address.

To understand the relation between a public key and an address better, it is again time to take a look at the source code of the reference client. A good starting point is the RPC method getnewaddress. This method is defined in the file wallet/rpcwallet.cpp and creates an instance of the class CBitcoinAddress which in turn is – surprise – derived from our old friend CBase58Data. The comments are quite helpful, and it is not difficult to figure out that a bitcoin address is obtained as follows from a public key.

create a hexadecimal compressed representation of the public key
apply a double hash to turn this into a sequence of 20 bytes – first apply the hash algorithm SHA256, then RIPEMD160 (this is called a Hash160 in the bitcoin terminology as the output will have 160 bits)
add a prefix to mark this as a public key address – the prefix is again defined in chainparams.cpp and and is zero for the main network and 111 for the test networks
take the hash256 checksum and append the first four bytes
apply Base58 decoding to the result

This is already very similar to what we have seen before and can be done in a few lines of Python code.

def hash160(s):
    _sha256 = hashlib.sha256(s).digest()
    return hashlib.new("ripemd160", _sha256).digest()
#
# Apply hash160
#
keyId = hash160(bytes.fromhex(pubKey))
#
# Append prefix for regtest network
#
address = bytes([111]) + keyId
#
# Add checksum
#
chk = hash256(address)[:4]
#
# and encode
#
address = btc.utils.base58Encode(address + chk)
print("Address:         ", address)

Heureka! If we run this, we get exactly the address mx5zVKcjohqsu4G8KJ83esVxN52XiMvGTY that the bitcoin client returned when we started our little journey at the beginning of the post on private keys.

As always, the full source code is also available on GitHub repository. If you want to run the code, simply enter

$ git clone https://github.com/christianb93/bitcoin.git
$ cd bitcoin
$ python Keys.py

That was it for today. We have now covered the basics of what constitutes participants in the bitcoin network. In the next few posts in this series. we will look at the second main object in the bitcoin world – transactions. We will learn how to interpret transactions, and will eventually be able to manually create a transaction to instruct a payment, sign it, hand it over to our test network and see how it is processed.

Keys in the bitcoin network: the private key

In my last post, I have shown you how arithmetic on elliptic curves can be used to create and verify digital signatures. We have seen that every party that creates a signature is represented by a private key – kept securely – and a public key, which is made available to everyone who wants to verify the signature. In a blockchain, digital signatures are used to verify ownership of bitcoins, and therefore private and public keys play a pivotal role in the bitcoin network. Bitcoin transaction outputs refer to public keys, and only the person that is in control of the matching private key can spend the bitcoin. Thus it is worth to take a closer look at how keys are represented in the bitcoin protocol.

Bitcoin uses the ECDSA signature algorithm to sign messages and verify signatures. Therefore a private key in the bitcoin network is simply an integer. More precisely, it is an integer between one and the order of the generator of an elliptic curve SECP256K1 (if that sounds like gibberish to you, you should read my previous post on this). Generating private keys is therefore very easy – you simply randomly select a 32 byte integer until you find one which is below that order (I am cheating a bit – obviously you need a good source of random numbers for this which makes it hard to predict the private key). I have done that for you:

$d = 103028256105408389446438916672504271192164767440296751065327418112299269382535$

Of course that is not a private key that I really use – it would not be smart to publish it if it were. But it is a perfectly valid private key. Unfortunately, this is not the way how a private key is typically stored and presented by a bitcoin client. In fact, what I did to create this key is to run the commands


$ bitcoin-cli -regtest getnewaddress "myAccount"

mx5zVKcjohqsu4G8KJ83esVxN52XiMvGTY

$ bitcoin-cli -regtest dumpprivkey "mx5zVKcjohqsu4G8KJ83esVxN52XiMvGTY"

cVDUgUEahS1swavidSk1zdSHQpCy1Ac9XSQHkaxmZKcTTfEA5vTY

after installing the reference bitcoin core software on my computer. The last line is the private key – and that looks very much different from the number d above (you do not necessarily need a bitcoin installation to follow this post, but you will definitely need it to run all the examples in future posts, so this might be a good point in time to stop and install it. So go to the download page, get the version for your machine and install it. Do not forget to start the bitcoin daemon in regtest mode, if you want to avoid downloading the full blockchain with more than 200 GBytes at the time of writing. In my bitcoin configuration file bitcoin.conf located in the directory ~/.bitcoin, I have set the options

regtest=1
server=1

to tell bitcoin to accept RPC commands and to run in regression test mode. But back to keys now …).

The funny string that the bitcoin client will present you as the private key is in fact an encoded private key using the WIF format (wallet interchange format). Let us try to understand how we can convert this into the number d displayed above.

The first thing you need to know about a WIF encoded private key is that it is encoded using the Base58 standard. Similar to Base64 (which an official IETF standard described in RFC4648), this is a standard to encode a number in a way that can easily be transmitted over channels like e-mail or even printed on paper without having to deal with binary values. Essentially, the idea is that we use an alphabet of 58 ASCII characters and to convert the number to the base 58, representing each digit by the corresponding character from this alphabet. In addition, there is some logic to handle leading zeros, more precisely to avoid that they are dropped during the conversion. If you want to see all this in detail, the authorative answer is (as always) the source code of the module base58c.cpp in the C++ reference implementation which is hosted on GitHub.

To do this in Python, we have – as always – several choices. We can search for a library that performs Base58 encoding and decoding – for instance https://github.com/keis/base58. For the sake of demonstration, I have created my own routines. To decode, i.e. to turn a Base58 string into a sequence of bytes, the following code will do.


BASE58_ALPHABET = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
def base58_decode(s):
    #
    # Strip off leading 1's as these represent leading
    # zeros in the original
    #
    zeros = 0
    while (zeros < len(s)) and (s[zeros] == '1'):
        zeros = zeros + 1
    s = s[zeros:]
    #
    # We first turn the string into an integer
    #
    value, power = 0, 1
    for _ in reversed(s):
        value += power * BASE58_ALPHABET.index(_)
        power = power * 58
    #
    # Now convert this integer into a sequence of bytes
    #
    result = value.to_bytes((value.bit_length() + 7) // 8, byteorder='big')
    #
    # and append the leading zeros again
    #
    for _ in range(zeros):
        result = (0).to_bytes(1, 'big') + result
    return result

I have stored this in a module btc.utils for later use, which you can find, along with the other examples from this post, in my GitHub repository.

Let us now apply this to our example WIF file.

import binascii
import btc.utils

#
# The WIF encoded private key
#
wif = "cVDUgUEahS1swavidSk1zdSHQpCy1Ac9XSQHkaxmZKcTTfEA5vTY"
print("WIF:    ", wif)
#
# Convert into a sequence of bytes
#
b = btc.utils.base58_decode(wif)
#
# and into hex
#
h = binascii.hexlify(b).decode('ascii')
print("Hex:    ", h)

If you run this code, you will find that the output has 76 characters, i.e. 76 / 2 = 38 bytes. That cannot be quite right, because we expect that our private key is an integer with 32 bytes only. So there are six extra bytes. Where do they come from?

To get used to it, let us again try to find the answer in the source code of the reference implementation (you can browse the code in the GitHub repository online or (recommended) clone to obtain a local copy so that you can use tools like grep and their friends. As a starting point, remember that we have received our private key via the bitcoin-cli tool that communicates with the bitcoin server via RPC calls, and that we have used the RPC method dumpprivatekey. So let us search for that.


(cd bitcoin/src ; grep -R -I "dumpprivkey" *)

That will give you a few matches in two files,wallet/rcpwallet.cpp and wallet/rpcdump.cpp. If you open these two files and look at the code, you will find that the former refers to the latter. This function first retrieves the key from the wallet and then creates an instance of the class CBitcoinSecret (which is derived from CBase58Data and invokes its ToString() method to obtain the textual representation of the key that is then returned.

One more usage of grep will tell you that the code for these classes is located in the file base58.cpp which we have already met. The constructor calls CBitcoinSecret::SetKey() and the method ToString() is implemented in the base class, so we also need to look at CBase58Data::ToString(). Going carefully through this code, we find that the data is actually composed of four parts.

The first byte is a version number which is used to distinguish a private key used for the testnet (239) from a key for the productive network (128) (look up the values in chainparams.cpp). The next few bytes are the actual secret d, encoded as a hexadecimal string using big endian encoding (i.e. the most significant octet first). The next byte is a flag that describes whether the public key that belongs to this private key should be stored in compressed format or not. We will get back to this point in a later post on public keys and addresses, for the time being we can safely ignore this byte.

The last four bytes are again interesting. They form a checksum for the remainder. These four bytes are obtained by applying what is called a Hash256 in the bitcoin language and which is just a double SHA256 hash, and then taking the first four bytes of the result. Thus in order to turn the WIF string into a number, we have to decode it using Base58, strip off the last four bytes, verify the checksum, strip off one additional byte, remove the trailing version number and convert the remaining hexadecimal string into an integer.

import binascii
import btc.utils
import hashlib

def hash256(s):
    return hashlib.sha256(hashlib.sha256(s).digest()).digest()

#
# The WIF encoded private key
#
wif = "cVDUgUEahS1swavidSk1zdSHQpCy1Ac9XSQHkaxmZKcTTfEA5vTY"
print("WIF:       ", wif)
#
# Convert into a sequence of bytes
#
b = btc.utils.base58_decode(wif)
#
# and into hex
#
h = binascii.hexlify(b).decode('ascii')
print("Hex:       ", h)
#
# Strip off checksum
#
chk = h[-8:]
print("Checksum:  ", chk)
h = h[:-8]
#
# and verify it
#
_chk = hash256(bytes.fromhex(h))[:4]
assert(_chk == bytes.fromhex(chk))
#
# Strip off version byte
#
print("Version:   ", int(h[:2], 16))
h = h[2:]
#
# and compression flag
#
h = h[:-2]

d = int.from_bytes(bytes.fromhex(h), "big")
print("Secret:    ", d)

If you run this, you should get the value for d that we started with at the beginning of this post.

We have now seen how private keys can be generated and encoded. Typically private keys are kept in a wallet, but they could also be printed out in their WIF encoded form and stored offline – you could even create a private key on a machine not connected to any network and store it in this way. This is called a paper wallet in the bitcoin world.

But what about the public key? If you want to receive bitcoins, the payee needs access to your public key or at leat to a condensed version of it called the bitcoin address. We have seen that for the ECDSA algorithm, the public key can be calculated given the private key, and we will see that the address can in turn be calculated given the public key. In my next post, I will guide you through this process.

A primer on elliptic curve cryptography: practice

In the last post, we have looked a bit at the theory behind elliptic curves. In this post, we will now see how all this works down to earth and use Python to actually run some calculations.

The first thing that we need is an explicit formula for the addition of two points on an elliptic curve. We will not derive this here, but simply give you the result – see for instance [1] for more details. Given two points $(x_1, y_1)$ and $(x_2, y_2)$ , the coordinates of their sum $(x_3, y_3)$ can be determined as follows.


inv = inv_mod_p(x2 - x1, p)
x3 = ((y2 - y1)*inv)**2 - x1 - x2
y3 = (y2 - y1)*inv*(x1 - x3) - y1

Here we assume that we have a function inv_mod that will give us the inverse modulo some prime number p which is the number of elements of our base field. Typically, this can be done using the so-called extended euclidian algorithm.

However, there are a few special cases we need to consider. This is apparent if we look at this formula in more detail – what happens if the inverse does not exist because the points $x_1$ and $x_2$ are equal?

This happens if we want to add two points that have the same x-coordinate. There are two cases we need to consider. First, the y-coordinates could be equal as well. Then we are trying to add a point to itself, and we need to apply the formula provided in [1] for that special case. Or the y-coordinate of the second point is minus the y-coordinate of the first point. Then we try to add a point to its own inverse. The result will be the neutral element of the group which is usually called the point at infinity (there is a reason for this: if we embedd the curve into a projective plane, this point will in fact be the intersection of its completion with the line at infinity in the projective plane…).

To describe points on an elliptic curve, we need both coordinates, the x and the y-coordinate. Thus it makes sense to implement a class for this purpose. Instances of this class need to store the x- and y-coordinates and – for convenience – a boolean that tells us whether a point is the point at infinity. So our full code for this class looks as follows.


class CurvePoint:

    def __init__(self, x, y, infinity = False):
        self.x = x
        self.y = y
        self.infinity = infinity

    def __add__(self, other):
        #
        # Capture trivial cases - one of the points is infinity
        #
        if self.infinity:
            return other
        if other.infinity:
            return self
        #
        # First check whether we are adding or doubling
        #
        x1 = self.x
        x2 = other.x
        y1 = self.y
        y2 = other.y
        infinity = False
        if (x1 - x2) % p == 0:
            #
            # Are we talking about doubling or addition
            # of the inverse?
            #
            if (y1 + y2) % p == 0:
                infinity = True
                x3 = 0
                y3 = 0
            else:
                inv = inv_mod_p(2*y1, p)
                x3 = (inv*(3*x1**2 + a))**2 - 2*x1
                y3 = (inv*(3*x1**2 + a))*(x1 - x3) - y1
        else:
            #
            # Standard case
            #
            inv = inv_mod_p(x2 - x1, p)
            x3 = ((y2 - y1)*inv)**2 - x1 - x2
            y3 = (y2 - y1)*inv*(x1 - x3) - y1

        return CurvePoint(x3 % p, y3 % p, infinity)

As already mentioned, that assumes that you have a function inv_mod_p in your namespace to compute the inverse modulo p. It also assumes that the variables p, a and b that describe the curve parameters are somewhere in your global namespace (of course you could introduce a class to represent a curve that stores all this, but let us keep it quick and dirty at this point).

Now having these routines, we can actually do a few example and verify that the outcome is as expected. We use a few examples with low values of p from [1] .

#
# Define curve parameters
#

p = 29
a = 4
b = 20
#
# and add a few points
#
A = CurvePoint(5,22)
B = CurvePoint(16, 27)
O = CurvePoint(0,0,infinity=True)
C = A + B
assert(C.x == 13)
assert(C.y == 6)
assert(C.infinity == False)
C = A + A
assert(C.x == 14)
assert(C.y == 6)
assert(C.infinity == False)
A = CurvePoint(17,19)
B = CurvePoint(17,10)
C = A + B
assert(C.infinity == True)
A = B + O
assert(A.x == B.x)
assert(A.y == B.y)
assert(A.infinity == B.infinity)
A = O + B
assert(A.x == B.x)
assert(A.y == B.y)
assert(A.infinity == B.infinity)

For the sake of demonstration, we have shown how to build elliptic curve arithmetic from scratch. We could now proceed to implement multiplication and the ECDS algorithm ourselves, but as so often, there is a Python library that will do this for us. Well, there is probably more than one, but I like the Python ECDSA library maintained by Brian Warner.

The most basic classes in this library are – you might have guessed that- curves and points. Curves are initialized providing the basic parameters p, a and b. Then points are created by specifying a curve and the x and y coordinates of the points. Thanks to operator overloading, points can then be added and multiplied with integers using standard syntax. Here is a code snippet that reproduces the first of our examples from above using the ECDSA library.


import ecdsa

#
# Create a curve with parameters p,a and b
# with the ECDSA library
#
curve = ecdsa.ellipticcurve.CurveFp(p,a,b)
#
# Define two points and add them
#
A = ecdsa.ellipticcurve.Point(curve, 5, 22)
B = ecdsa.ellipticcurve.Point(curve, 16, 27)
C = A + B
assert(C.x() == 13)
assert(C.y() == 6)
assert(C != ecdsa.ellipticcurve.INFINITY)

Very easy – and very useful. And of course reassuring to see that we get the same result than our hand-crafted code for the arithmetic above gave us.

But this is not yet all, the library can of course do much more. Let us use it to create a signature. For that purpose, we obviously need a reasonable large value for the prime p, otherwise we could easily use a brute-force attack to determine our private key from the public key. In my previous post on the theoretical foundations, I have already mentioned the papers published by the SECG, the Standards for efficient cryptography group. This group has published some standard curves that we can use. One of them is the curve SECP256K1 which is a curve over a prime field $F_p$ with

$p = 2^{256} - 2^{32} - 2^{9} - 2^{8} - 2^{7}-2^{6} - 2^{4} - 1 = 115792089237316195423570985008687907853269984665640564039457584007908834671663$

This curve is the curve which is used by the bitcoin protocol. As many other standardized curves, it is hard-coded in the ECDSA library. To get it, use the following code.


#
# Get the standard curve SECP256K1
# and its parameters
#
curve = ecdsa.curves.SECP256k1
G = curve.generator
p = curve.curve.p()
a = curve.curve.a()
b = curve.curve.b()
n = G.order()

Let us now apply what we have learned about signatures. First, we need a private key and a public key determined from it. Thus we pick a random number that will be our secret and multiply the generator of the curve by it to get our public key.

#
# Determine a private key and a public key
#
d = ecdsa.util.randrange(n-1)
Q = d*G
pKey = ecdsa.ecdsa.Public_key(G, Q)
sKey = ecdsa.ecdsa.Private_key(pKey, d)

Next we need a hash value that we will sign. Usually, we would derive this value from a message using some cryptographic hash function like SHA256, but we will simply simulate this by drawing a random number h. We can then use the method sign of the ECDSA private key object to create a signature. This will return a signature object from which we can retrieve the values r and s.

h = ecdsa.util.randrange(n-1)
k = ecdsa.util.randrange(n-1)
signature = sKey.sign(h, k)
r = signature.r
s = signature.s

Let us verify that the algorithm really works as described the previous post. So we first need to multiply our randomly chosen integer k with the generator of the curve, the number r should then be the x-coordinate of this point. We then invert k modulo n and multiply the result by h + dr. This should give us s.

_r = (k*G).x() % n
assert(_r == r)

w = inv_mod_p(k, n)
_s = ((h+d*r)*w) % n
assert(_s == s)

If you run this code, you will hopefully see that the assertions pass. Our code works, and produces the same result as the ECDSA library. That is already reassuring. Having gone so far, we can now of course also verify the signature – again we do this once using the ECDSA library and once using our own code.

#
# Now we manually verify the signature
#
w = inv_mod_p(s, n)
assert(1 == (w*s % n))
u1 = w * h % n
u2 = w * r % n
X = u1*G + u2*Q
assert(X.x() == r)

#
# Finally we verify the signature using the
# lib
#
assert(pKey.verifies(h, signature) == True)

That is it for today. We have seen how elliptic curves can be used in practice to create and verify digital signatures and have looked at the ECDSA library that offers ready made functions for that purpose. The full source code can by downloaded from GitHub.

In my next post, I will look at the way how private and public ECDSA keys appear in the bitcoin protocol. If you want to learn more and play around a bit with elliptic curves in the meantime, I recommend the online tool [2]. You might also want to take a look at Andrea Corbellinis excellent post on elliptic curve cryptography.

1. D. Hankerson, A. Menezes, S. Vanstone,
Guide to elliptic curve cryptography, Springer, New York 2004
2. Elliptic curve point addition online tool at https://cdn.rawgit.com/andreacorbellini/ecc/920b29a/interactive/modk-add.html