In my last post, I have shown you how arithmetic on elliptic curves can be used to create and verify digital signatures. We have seen that every party that creates a signature is represented by a private key – kept securely – and a public key, which is made available to everyone who wants to verify the signature. In a blockchain, digital signatures are used to verify ownership of bitcoins, and therefore private and public keys play a pivotal role in the bitcoin network. Bitcoin transaction outputs refer to public keys, and only the person that is in control of the matching private key can spend the bitcoin. Thus it is worth to take a closer look at how keys are represented in the bitcoin protocol.
Bitcoin uses the ECDSA signature algorithm to sign messages and verify signatures. Therefore a private key in the bitcoin network is simply an integer. More precisely, it is an integer between one and the order of the generator of an elliptic curve SECP256K1 (if that sounds like gibberish to you, you should read my previous post on this). Generating private keys is therefore very easy – you simply randomly select a 32 byte integer until you find one which is below that order (I am cheating a bit – obviously you need a good source of random numbers for this which makes it hard to predict the private key). I have done that for you:
Of course that is not a private key that I really use – it would not be smart to publish it if it were. But it is a perfectly valid private key. Unfortunately, this is not the way how a private key is typically stored and presented by a bitcoin client. In fact, what I did to create this key is to run the commands
$ bitcoin-cli -regtest getnewaddress "myAccount" mx5zVKcjohqsu4G8KJ83esVxN52XiMvGTY $ bitcoin-cli -regtest dumpprivkey "mx5zVKcjohqsu4G8KJ83esVxN52XiMvGTY" cVDUgUEahS1swavidSk1zdSHQpCy1Ac9XSQHkaxmZKcTTfEA5vTY
after installing the reference bitcoin core software on my computer. The last line is the private key – and that looks very much different from the number d above (you do not necessarily need a bitcoin installation to follow this post, but you will definitely need it to run all the examples in future posts, so this might be a good point in time to stop and install it. So go to the download page, get the version for your machine and install it. Do not forget to start the bitcoin daemon in regtest mode, if you want to avoid downloading the full blockchain with more than 200 GBytes at the time of writing. In my bitcoin configuration file bitcoin.conf located in the directory ~/.bitcoin, I have set the options
regtest=1
server=1
to tell bitcoin to accept RPC commands and to run in regression test mode. But back to keys now …).
The funny string that the bitcoin client will present you as the private key is in fact an encoded private key using the WIF format (wallet interchange format). Let us try to understand how we can convert this into the number d displayed above.
The first thing you need to know about a WIF encoded private key is that it is encoded using the Base58 standard. Similar to Base64 (which an official IETF standard described in RFC4648), this is a standard to encode a number in a way that can easily be transmitted over channels like e-mail or even printed on paper without having to deal with binary values. Essentially, the idea is that we use an alphabet of 58 ASCII characters and to convert the number to the base 58, representing each digit by the corresponding character from this alphabet. In addition, there is some logic to handle leading zeros, more precisely to avoid that they are dropped during the conversion. If you want to see all this in detail, the authorative answer is (as always) the source code of the module base58c.cpp in the C++ reference implementation which is hosted on GitHub.
To do this in Python, we have – as always – several choices. We can search for a library that performs Base58 encoding and decoding – for instance https://github.com/keis/base58. For the sake of demonstration, I have created my own routines. To decode, i.e. to turn a Base58 string into a sequence of bytes, the following code will do.
BASE58_ALPHABET = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz' def base58_decode(s): # # Strip off leading 1's as these represent leading # zeros in the original # zeros = 0 while (zeros < len(s)) and (s[zeros] == '1'): zeros = zeros + 1 s = s[zeros:] # # We first turn the string into an integer # value, power = 0, 1 for _ in reversed(s): value += power * BASE58_ALPHABET.index(_) power = power * 58 # # Now convert this integer into a sequence of bytes # result = value.to_bytes((value.bit_length() + 7) // 8, byteorder='big') # # and append the leading zeros again # for _ in range(zeros): result = (0).to_bytes(1, 'big') + result return result
I have stored this in a module btc.utils
for later use, which you can find, along with the other examples from this post, in my GitHub repository.
Let us now apply this to our example WIF file.
import binascii import btc.utils # # The WIF encoded private key # wif = "cVDUgUEahS1swavidSk1zdSHQpCy1Ac9XSQHkaxmZKcTTfEA5vTY" print("WIF: ", wif) # # Convert into a sequence of bytes # b = btc.utils.base58_decode(wif) # # and into hex # h = binascii.hexlify(b).decode('ascii') print("Hex: ", h)
If you run this code, you will find that the output has 76 characters, i.e. 76 / 2 = 38 bytes. That cannot be quite right, because we expect that our private key is an integer with 32 bytes only. So there are six extra bytes. Where do they come from?
To get used to it, let us again try to find the answer in the source code of the reference implementation (you can browse the code in the GitHub repository online or (recommended) clone to obtain a local copy so that you can use tools like grep and their friends. As a starting point, remember that we have received our private key via the bitcoin-cli tool that communicates with the bitcoin server via RPC calls, and that we have used the RPC method dumpprivatekey. So let us search for that.
(cd bitcoin/src ; grep -R -I "dumpprivkey" *)
That will give you a few matches in two files,wallet/rcpwallet.cpp
and wallet/rpcdump.cpp
. If you open these two files and look at the code, you will find that the former refers to the latter. This function first retrieves the key from the wallet and then creates an instance of the class CBitcoinSecret
(which is derived from CBase58Data
and invokes its ToString()
method to obtain the textual representation of the key that is then returned.
One more usage of grep will tell you that the code for these classes is located in the file base58.cpp
which we have already met. The constructor calls CBitcoinSecret::SetKey()
and the method ToString()
is implemented in the base class, so we also need to look at CBase58Data::ToString()
. Going carefully through this code, we find that the data is actually composed of four parts.
The first byte is a version number which is used to distinguish a private key used for the testnet (239) from a key for the productive network (128) (look up the values in chainparams.cpp
). The next few bytes are the actual secret d, encoded as a hexadecimal string using big endian encoding (i.e. the most significant octet first). The next byte is a flag that describes whether the public key that belongs to this private key should be stored in compressed format or not. We will get back to this point in a later post on public keys and addresses, for the time being we can safely ignore this byte.
The last four bytes are again interesting. They form a checksum for the remainder. These four bytes are obtained by applying what is called a Hash256 in the bitcoin language and which is just a double SHA256 hash, and then taking the first four bytes of the result. Thus in order to turn the WIF string into a number, we have to decode it using Base58, strip off the last four bytes, verify the checksum, strip off one additional byte, remove the trailing version number and convert the remaining hexadecimal string into an integer.
import binascii import btc.utils import hashlib def hash256(s): return hashlib.sha256(hashlib.sha256(s).digest()).digest() # # The WIF encoded private key # wif = "cVDUgUEahS1swavidSk1zdSHQpCy1Ac9XSQHkaxmZKcTTfEA5vTY" print("WIF: ", wif) # # Convert into a sequence of bytes # b = btc.utils.base58_decode(wif) # # and into hex # h = binascii.hexlify(b).decode('ascii') print("Hex: ", h) # # Strip off checksum # chk = h[-8:] print("Checksum: ", chk) h = h[:-8] # # and verify it # _chk = hash256(bytes.fromhex(h))[:4] assert(_chk == bytes.fromhex(chk)) # # Strip off version byte # print("Version: ", int(h[:2], 16)) h = h[2:] # # and compression flag # h = h[:-2] d = int.from_bytes(bytes.fromhex(h), "big") print("Secret: ", d)
If you run this, you should get the value for d that we started with at the beginning of this post.
We have now seen how private keys can be generated and encoded. Typically private keys are kept in a wallet, but they could also be printed out in their WIF encoded form and stored offline – you could even create a private key on a machine not connected to any network and store it in this way. This is called a paper wallet in the bitcoin world.
But what about the public key? If you want to receive bitcoins, the payee needs access to your public key or at leat to a condensed version of it called the bitcoin address. We have seen that for the ECDSA algorithm, the public key can be calculated given the private key, and we will see that the address can in turn be calculated given the public key. In my next post, I will guide you through this process.
1 Comment