# Keys in the bitcoin network: the public key

In my last post, we have looked in some detail at the private key – how it is generated and how it can be decoded and stored. Let us now do the same with the public key.

Recall that a public key is simply a point on the elliptic curve SECP256K1 that is used by the underlying ECDSA algorithm – in fact it is obtained by multiplying the generator point on the curve by our private key. As any point on the curve, it therefore has an x-coordinate and a y-coordinate, both being 32 bytes unsigned integers.  So one way to encode the public key would be as follows.

• take the x-coordinate as a point, represented by an integer smaller than p
• convert this into a 32 byte hexadecimal string, using for instance big endian encoding
• do the same for the y-coordinate
• and concatenate these two strings to obtain a single 64 byte hexadecimal string

This encoding is simple, but it has a drawback. Remember that we encode not just a random pair of integers, but a point on the curve, so the x-coordinate and y-coordinate are related by the curve equation

$y^2 = x^3 + ax + b$

Thus given x, we almost know y – we know the square of y modulo p, and there can be at most two different roots of this equation. So we could reconstruct y if we have x and an additional bit that tells us which of the two solutions we need.

Let us now assume that p is odd. If y is a solution of the equation for a given value of x, then p – y (which is -y modulo p) is the second solution. As p is odd, exactly one of the two numbers y and p – y is even. We can therefore use an additional bit that is equal to y modulo 2 to distinguish the two solutions. It is convention to store this bit in a full additional byte, using the value 2 if y is even and the value 3 if y is odd, so that we obtain a representation of the public key (and in fact any other point on the curve) in at most 33 bytes: at most 32 bytes for the value of the x-coordinate and the additional byte containing the value of y modulo 2. This representation is called the compressed representation (see for instance the publication of the SECG, section 2.3).

If there is a compressed representation, you might expect that there is also an uncompressed representation. This is simply the representation that we have described above, i.e. storing both x and y, with an additional twist: to be able to distinguish this from a compressed representation that always starts with 0x02 or 0x03, a leading byte with value 0x04 is added so that the total length of an uncompressed representation is at most 65 bytes. Since version 0.6.0, the bitcoin reference implementation defaults to using compressed keys (see the function CWallet::GenerateNewKey).

Let us summarize what we have learned so far in a short Python code snippet that will take a private key (stored as integer in the variable d), calculate the corresponding point on the elliptic curve SECP256K1 using the ECDSA library and create a compressed representation of the result.

#
# Determine the public key from the
# secret d
#
import ecdsa
curve = ecdsa.curves.SECP256k1
Q = d * curve.generator
#
# and assemble the compressed representation
#
x = Q.x()
y = Q.y()
pubKey = x.to_bytes(length=32, byteorder="big")
pubKey = binascii.hexlify(pubKey).decode('ascii')
if 1 == (y % 2):
pubKey = "03" + pubKey
else:
pubKey = "02" + pubKey
print("Compressed key:  ", pubKey)

This way of encoding a public key is in fact not specific to the bitcoin network, but a standard that is used whenever a point on an elliptic curve needs to be encoded – see for instance RFC5480 by the IETF which is part of the X.509 standard for certificates.

However, this is still a bit confusing. If you known the nuts and bolts of the bitcoin protocol a bit, you will have seen that participants publish something that is called an address which is a string similar to

mx5zVKcjohqsu4G8KJ83esVxN52XiMvGTY

That does not look at all like a compressed or uncompressed public key. We are missing something.

The answer is that an address is in fact not a public key, but it is derived from a public key. More precisely, it is an encoded version of a hash value of the public key. So given the address, it is easy to verify that this address belongs to a certain public key, but it is very hard to reconstruct the public key given the address.

To understand the relation between a public key and an address better, it is again time to take a look at the source code of the reference client. A good starting point is the RPC method getnewaddress. This method is defined in the file wallet/rpcwallet.cpp and creates an instance of the class CBitcoinAddress which in turn is – surprise – derived from our old friend CBase58Data. The comments are quite helpful, and it is not difficult to figure out that a bitcoin address is obtained as follows from a public key.

• create a hexadecimal compressed representation of the public key
• apply a double hash to turn this into a sequence of 20 bytes – first apply the hash algorithm SHA256, then RIPEMD160 (this is called a Hash160 in the bitcoin terminology as the output will have 160 bits)
• add a prefix to mark this as a public key address – the prefix is again defined in chainparams.cpp and and is zero for the main network and 111 for the test networks
• take the hash256 checksum and append the first four bytes
• apply Base58 decoding to the result

This is already very similar to what we have seen before and can be done in a few lines of Python code.

def hash160(s):
_sha256 = hashlib.sha256(s).digest()
return hashlib.new("ripemd160", _sha256).digest()
#
# Apply hash160
#
keyId = hash160(bytes.fromhex(pubKey))
#
# Append prefix for regtest network
#
#
#
#
# and encode
#