September 2021 – LeftAsExercise

Non-fungible token (NFT) and the ERC-721 standard

NFTs (non-fungible token) are one of the latest and most exciting developments in the blockchain universe, with use cases ranging from such essential things as breeding cute virtual kitten digitally on the blockchain to digital auctions as conducted earlier this year by the renowned fine art auctioneer Sothebys’s. In this post, we will explain why an NFT is nothing but a smart contract with specific functionality and talk about the ERC-721 standard that formally defines NFTs.

Non-fungible token

In a previous post in this series, we have looked at token according to the ERC-20 standard. We have seen that in its essence, a token is implemented by a smart contract that is maintaining a mapping of accounts to balances to track ownership in a digital currency.

As for a traditional currency, documenting ownership by just keeping track of how many token you own works because such a token is completely fungible – any two are the same. If you hold a token, say the Chainlink (LINK) token, the blockchain records a balance, say 100 LINK, and if you transfer 20 LINK to another account, it does not make sense to ask which of the 100 LINK you have transferred.

This is a good approach to model a currency, but sometimes, you want to achieve something else – you want to document ownership in a uniquely identifyable asset, say a piece of art, or a property. To do this, you would assign a unique ID to each asset, and then keep track of who owns which asset by maintaining a mapping of asset IDs to owners. This is more or less what a non-fungible token does.

Correspondingly, a non-fungible token contract (NFT contract) is essentially a smart contract that maintains a data structure to document ownership in a specific item, modeled as a mapping from item IDs (the so-called token ID) to the current owner. Suppose, for instance, an artist releases a collection of digital pieces of art, numbered from 1 to 100, and sells them as NFTs. Then, the token ID would range from 1 to 100, every ID would represent the corresponding piece of art, and the mapping would document who owns which item.

Apart from the mapping itself, the contract would also have to offer methods to transfer ownership, say a method transfer that would accept a token ID and the new owner as input and would adjust the mapping to update the owner accordingly. You could also come up again with an approach to pre-approve transfers so that the new owner could actively call into the contract to claim ownership, and, in addition, you would probably add a few convenience functions, for instance to figure out the current owner for a given token ID.

The ERC-721 standard

Similar to the ERC-20 standard, the community has developed a standard – the ERC-721 standard – for smart contracts representing NFTs. However, the ERC-721 standard is considerably more complicated than the ERC-20 standard. Here is an overview of its methods and events.

That might look a bit intimidating, but do not worry – we will go through each of the building blocks step by step, and things will fall into place.

Let us start with balances. ERC-721 defines two different approaches to balances. First, there is the ownerOf method which simply returns the owner of a specific asset, identified by the token ID. Essentially, this simply queries the mapping of token IDs to current owners that the contract maintains. Here, the token ID is a 256-bit integer, i.e. a uint256 in Solidity. In addition, there is still a balanceOf method that returns the total number of NFTs owned by a specific account (this is useful in combination with the enumeration features that we will discuss below).

Next, let us start to discuss transfers. The easiest way to initiate a token transfer is to use the method transferFrom. This method allows a caller to transfer a given token, identified by the token ID, from its current holder to different address. Of course, the sender of the message needs to be authorized to perform the transfer – the current owner of a token is always authorized, but there are more options, we will get to this point below.

This is essentially the same structure and logic as the transfer method of a fungible token according to ERC20. However, there is a risk associated with using this method. Suppose that you use this method to transfer a token to a certain address, and that the receiver is a smart contract. Then the NFT will be transferred to the smart contract, and only a transaction originating from the smart contract can transfer the token to another account. If the smart contract is not prepared for this, i.e. if it does not have a method to initiate such a transfer, the NFT is forever lost (unless, maybe, the contract is an upgradeable contract) and therefore the NFT will remain assigned to the contract forever.

To at least partially mitigate this risk, the ERC-721 standard encourages contracts that are capable of managing NFTs to make this visible by implementing a marker interface. Specifically, a contract that is prepared to receive NFT should implement a method

function onERC721Received(address operator, address from, uint256 tokenId, bytes data) external returns(bytes4);

The idea behind this is similar to the receive and fallback functions in Solidity. Of course, the pure presence of this function does not say anything about its implementation, but at least it indicates that the author of the contract was aware of the possibility that the contract might receive an NFT.

In order to restrict transfers to transfers to either an EOA or a contract that implements the marker interface, an NFT contract offers the method safeTransferFrom. This method is very similar to an ordinary transfer, with two exceptions. First, it is supposed to check whether the receiving address is a smart contract (or, which is not exactly the same thing, has a non-zero code). If yes, it will try to invoke the method onERC721Received of the target contract which is supposed to return a defined sequence of four bytes (a “magic value”). If the target contract does not implement the method, or the method exists but returns a different value, then the transfer will fail.

Second, the method safeTransferFrom optionally accepts a data field that can contain an arbitrary sequence of bytes which is handed over to onERC721Received of the recipient. The target contract can then, for instance, log this data or perform some other operations like updating a balance and storing the passed data as a reference.

Let us now turn to authorizations – who is allowed to initiate a transfer? Of course, the owner of an NFT is always authorized to transfer it. In addition to this, the withdrawal pattern is supported as well, similar to the ERC20 standard. In fact, there is a method approve that the owner of an NFT can invoke to authorize someone else to transfer this token. Approvals can also be explicitly revoked, and of course approvals are reset if ownership for an NFT changes.

In addition to this explicit approval that refers to a specific token ID, it is also possible to register another address as being authorized to make any transfer on your behalf, i.e. as an operator. Once defined, an operator can transfer any token that you own and can also make approvals and therefore authorize withdrawals. This global approval method has no equivalent in the ERC20 standard, but there is an extension (EIP-777) to the standard which adds this functionality for fungible token as well.

Finally, the standard defines events that are supposed to be emitted when a transfer is made, an approval is granted or revoked or an operator is named or removed.

The enumeration extension

The ERC721 standard makes it easy to figure out the current owner of an NFT once you know the token ID – simply call ownerOf using the token ID as argument. However, the token ID can, in general, be any 256 bit number, and there is no reason to assume that this will always be a simple sequence. As a consequence, it is not obvious how to figure out which token IDs are actually in use, i.e. which token have already been minted.

To address this, the standard defines a set of optional methods that allow a user to enumerate all existing tokens. This proceeds in two steps. First, the method totalSupply is supposed to return the total number of token in existence, i.e. token that have already been minted. Then, the method tokenByIndex can be called with an index less than the total supply to get the ID of a specific token. Similar, the balanceOf method (which is mandatory) returns the number of token held by a specific owner, and tokenOfOwnerByIndex can be used to enumerate these token. Implementing these optional methods requires an additional data structure in the contract, for instance an array that contains all token IDs.

This enumeration extension is the only standardized way to get a list of existing token IDs. It forces the contract, however, to implement and maintain additional data structures and I would assume that many contract owners have chosen not to implement it (in the next post, we will look at a few real-word examples, and as a matter of fact, none of them implements this extension). Alternatively, a contract could emit a (non-standard) initial log entry upon contract creation to indicate all token IDs that are available directly after contract creation, and then a user could monitor the Transfer events which, per the specification, should be emitted if an additional token is minted.

That concludes our post for today. You might have noted that we have not yet discussed an extension that is indicated in the diagram at the top of this article – the metadata extension. This extension touches upon an interesting question – if an NFT documents ownership in (say) a digital asset, where is the actual asset stored? This question and its ramifications will be the topic of the next post in this series.

Reconstructing the August 27th attack on the Ethereum blockchain

On August 27th 2021, a malicious transaction was created on the Ethereum mainnet, targeting a vulnerability in all versions of Go-Ethereum up to 1.10.7. If successful, this could have resulted in a fork of the production network. Fortunately, this could be avoided as a sufficient number of nodes were already running version 1.10.8 of Go-Ethereum which had been released as a hotfix three days ago. Armed with the understanding from my previous posts of how the EVM and in particular calls work internally, we are now in a position to analyze what really happened and how the exploit works.

What happened

On August 24th, the Go-Ethereum developer team rushed to release geth v1.10.8 which was announced as a hotfix, fixing a vulnerability that had been discovered during an audit of the Telos EVM, which is a copy of the Ethereum EVM running on the Telos blockchain. In the announcement, no details were made public yet, but in the meantime, more details have been posted by other teams and researchers (for instance here).

If you release a hotfix in an open source project, making it easy for everybody to simply ask GitHub to create a diff for you, obviously the black hats will start to reverse-engineer the changes to understand what the problem was and will try to exploit this. This is exactly what happened in this case as well.

In fact, three days later, on August 27th, one of the Go-Ethereum core developers posted an alert on Twitter, urging node maintainers to upgrade and announcing that an active attempt to exploit the vulnerability had been observed on mainnet. In the same thread, a link to the malicious transaction (with transaction hash 0x1cb6fb36633d270edefc04d048145b4298e67b8aa82a9e5ec4aa1435dd770ce4) on Etherscan was published shortly after. It turns out that the root cause of the issue is related to how geth handles the processing of calls and their return values, and, having gone through all this in the previous posts, we are now in a good position to understand what the problem was. In this section, we will use the details of the malicious transaction to replay it, both with geth 1.10.8 (where the problem has been fixed) and geth 1.10.6 (where the problem still exists), to understand why it has the potential to cause a split of the blockchain. In the next section, we will then analyze the source code to understand the issue and how it has been fixed.

Let us first replay the transaction using geth 1.10.8. I assume that you have copies of geth 1.10.8 and geth 1.10.6 in your path (if not, head over to the project download page and get the binaries for your OS). Our approach will be to create two blockchain data directories, one for each version, so that we start with the same initial state. We will then run the transaction against both versions and observe that the outcomes differ.

There is a little subtlety, though. If you start geth with a fresh data directory, it will also randomly create a new developer account which becomes part of the genesis block. Therefore, running geth twice with different data directories will in general not produce the same initial state. To avoid this, we share the key store between both instances, so that they both use the same developer account. So we will have three directories – geth1108 which will be the data directory for v1.10.8, geth1106 which will be the data directory for v1.10.6, and gethcommon which will contain the key store. We will start with geth v1.10.8 which will also create the developer account for us.

# Assume that geth1108 and geth1106 are the respective binary
# and on your path
mkdir geth1108
mkdir gethcommon
geth1108 \
   --datadir $(pwd)/geth1108/ \
   --keystore $(pwd)/gethcommon/ \
   --dev \
   --http

Once the client is running, let us, for later reference, get the hash of the genesis block. In a separate terminal (but in the same working directory), attach a geth console, and, once the console prompt appears, get the hash value of the genesis block.

geth1108 attach $(pwd)/geth1108/geth.ipc
eth.getBlockByNumber(0).hash

Write down this hash value somewhere, for me, it was 0x3b154292c6ec669d736df498663075cf7140b3aa3f287a5dc6b55477937f8ad6, but when you try this, you will get a different value as you will most likely get a different etherbase account and thus a different genesis block.

Now let us run the exploit transaction, determine the address of the contract that has been generated and get the contract (runtime) code and the hash of the resulting block.

dev = eth.accounts[0]
input = "0x3034526020600760203460045afa602034343e604034f3"
value = 0
txn = {
  "from": dev,
  "value": value,
  "input": input, 
  "gas": 200000
}
txnhash = eth.sendTransaction(txn)
// wait until the transaction has been mined, then run
c = eth.getTransactionReceipt(txnhash).contractAddress
eth.getCode(c)
eth.getBlockByNumber(1).hash

Again, write down the the hash value of the second block and the contract code. Now let us repeat all this with geth 1.10.6. Stop the console and the running instance of geth. Then start a new instance of geth 1.10.6, using, as explained above, a different data directory but the same key store directory.

mkdir geth1106
geth1106 \
   --datadir $(pwd)/geth1106/ \
   --keystore $(pwd)/gethcommon/ \
   --dev \
   --http

Looking at the startup messages of the client, you should be able to verify that the developer account is the same as before. Now start the geth console again, this time pointing to the IPC endpoint of the now running geth 1.10.6

geth1106 attach $(pwd)/geth1106/geth.ipc

Now, repeat the steps above. First, determine the hash value of the first block and verify that you get the same result as previously. Then, run the code above to also submit the transaction in our new blockchain, get the code and the hash number of the generated block.

You should see that the code generated by geth 1.10.6 is different. In fact, the code produced with geth 1.10.8 should be the contract address (32 bytes), followed by the last seven bytes of the contract address and filled up with zeros. For geth 1.10.6, the code consists of the first seven bytes of the contract address, followed by the full contract address and again filled up with zeros . Correspondingly, the hash of the new block is different (because the state is different and the state root is part of the block).

So we see that two different versions of the client start with the same state (the genesis block) and run the same transaction, but arrive at a different state after the transaction has been processed. This, of course, is a desaster – if a network is comprised of nodes with these two versions, the nodes will form two partitions (one running the new version and one running the old version) and the members of the two partitions will disagree about the correct state. Thus, in the worst case, the chain will fork.

Fortunately, this is not what happened in real life, as apparently a sufficiently large number of nodes had already upgraded to the latest version when the exploit hit the network.

So we have managed to reconstruct the attack and verify that it does, in fact, lead to a potential fork. Let us now try to understand what the problem was and how the exploit works.

Why it happened

To understand what the exploit code is doing, let us disassemble it, using for instance debug.traceBlockByNumber(1)[0].result in the geth console. Here is an opcode view of the input data (which, as we know, will be run as deploy bytecode when the transaction is processed).

ADDRESS
CALLVALUE
MSTORE
PUSH1 0x20
PUSH1 0x07
PUSH1 0x20
CALLVALUE 
PUSH1 0x04
GAS
STATICCALL
PUSH1 0x20
CALLVALUE
CALLVALUE
RETURNDATACOPY
PUSH1 0x40
RETURN

The first three lines will push the address of the contract being created and the call value (which is zero) onto the stack and run MSTORE, so that the stack will be empty again and the memory will contain the contract address at position 0x0.

Next, the code again sets up the stack, which, when we reach the STATICCALL, will look as follows (items at the top of the stack on the left)

remaining gas | 0x4 | 0x0 | 0x20 | 0x7 | 0x20

Now we know that a call to address 0x04 invokes the precompiled contract “data copy”. The input is specified by items three and four on the stack, i.e. the 32 bytes at address 0x0, which, as we know, is the contract address. The output is to be placed at address 0x7. Thus after returning, the memory contains the first 7 bytes of the contract at 0x0, followed by the full contract address.

Next, we again see a couple of instructions that prepare the stack, and then we invoke RETURNDATACOPY. Upon reaching this opcode, our stack is

0x0 | 0x0 | 0x20 | 0x1

Recall that RETURNDATACOPY is supposed to copy the result of the last call-like operation to memory. In this case, we ask the EVM to copy the result of the last CALL (which, as we know, is the contract address) to address 0x0. Thus after executing this statement, the memory should contain the contract address at location 0x0, followed by the last seven bytes of the contract address at the beginning of the second 32 byte word. The final RETURN would the return these two words as runtime bytecode, so that the runtime bytecode should be the contract address, followed by the last seven bytes of the contract address repeated. This, in fact, is what you observe when you look at the trace and the contract generated with geth 1.10.8

Unfortunately, with geth 1.10.6, the trace shows that here, the RETURNDATACOPY does not change the memory content at all. Consequently, the runtime bytecode is the current memory content, i.e. the first seven bytes of the contract address followed by the full contract address. This is the bug that has been discovered and exploited.

Let us now take a look at the Go-Ethereum source code and try to understand what the problem is. In our previous posts (here, here and here) on the inner workings of the EVM, we have already analyzed in detail how a call works internally. We process the STATICCALL in the opStaticCall function where we invoke the StaticCall method of the EVM. Here, we figure out that the target of the call is a precompiled contract, so we call RunPrecompiledContract. At this point in time, the input is a pointer to the memory of the calling contract, starting at offset 0x0, i.e. the contract address.

The implementation of the precompiled contract at address 0x4 now simply returns the exact same pointer again. Thus, when we get back into opStaticCall, the variable ret is now a pointer to offset 0x0 in the contract memory. Next, we copy the return value of the call (the contract address) to its target location in memory, i.e. to offset 0x7.

The problem is that this of course modifies the memory to which ret still points. Thus the value at the memory location pointed to by ret is now no longer the return value of the precompiled contract, but the new, overwritten memory content. Unfortunately, back in the main interpreter loop, we nevertheless use the returned pointer and assign it to the return data buffer (here). Thus the return data buffer does now not contain the original return value of the precompiled contract,as it should, but the already modified memory content. When we now access this with RETURNDATACOPY, we copy this memory content to itself, resulting in the effective no-op that we observe.

In version 1.10.8, this line has been added in opStaticCall which creates a copy of the return value before modifying the memory content, thereby avoiding the problem. Thus, as we can observe, version 1.10.8 correctly returns the actual contract address when executing the RETURNDATACOPY opcode. A nice example for the risks inherent with the use of pointers in any programming language…..

Personally, I was a bit surprised to see this happening, as a related vulnerability was already identified and fixed with v1.9.17 in July 2020. This is also an interesting coincidence as at this time, the developers had chosen not to declare this release a hotfix, and consequently, many miners did not upgrade. The vulnerability was then actually exploited in November 2020, and splitted nodes that had not yet been upgraded off from the network. The geth team later conducted a post mortem in which they also argued why they had chosen not the announce the fix in public but to effectively ship it as an unannounced hard fork. In hindsight, this probably was a good decision – after all, almost four months had passed after the release without anyone noticing and exploiting the change. In the current case, where the team has chosen to make the fix public and to urge operators to upgrade on social media, it only took three days between the upgrade and the exploit, so this will most likely re-ignite the debate on how the team should handle consensus bugs once they have been discovered (but hopefully also a debate about how to better catch this sort of issues in the future).

This closes our post for today – I hope you found it interesting to see how a real-world consensus bug might look like and how it could be exploited. Hope to see you soon.

Understanding the Ethereum virtual machine – part III

Having found our way through the mechanics of the Ethereum virtual machine in the last post, we are now in a position to better understand what exactly goes on when a smart contracts hand over control to another smart contract or transfers Ether.

Calls and static calls

Towards the end of the previous post in this series, we have already taken a glimpse at how an ordinary CALL opcode is being processed. As a reminder, here is the diagram that displays how the EVM and the EVM interpreter interact to run smart contract

In our case – the processing of a CALL – this specifically implies that the following steps will be carried out (we ignore gas processing for the time being, as this is a bit more complicated and will be discussed in depth in a separate section).

the interpreter hits upon the CALL opcode
it performs a look up in the jump table and determines that the function opCall needs to be run
it gets the parameters from the stack, in particular the address of the contract to be called (the second stack item)
it then extracts the input data from the memory of the currently executing code
we then invoke the Call method of the EVM, using the contract address and the input data as arguments
as we have learned, this will result in a new execution context (i.e. a new Contract object, a new stack and a freshly initialized memory) in which the code of the target contract will be executed
at the end, we get the returned data (an array of bytes) back
if everything went fine, we copy the returned data back into the memory of the currently executing contract

It is important to understand that this comes with a full context switch – the target contract will execute with its own stack and memory, state changes made by the target contract will refer to the state of the target contract, and Ether transferred with the call is credited to the target contract.

Also note that there are actually two ways how the result of the call is made available to the caller. First, the result of the call (a pointer to a byte array) will be copied to the memory of the calling contract. In addition, the return value is also returned by opCall and there it is copied once more, this time to a special buffer called the return data buffer. The caller can copy the data stored in this buffer and determine its length using the RETURNDATACOPY and RETURNDATALENGTH opcodes introduced with EIP-211 (in order to make it easier to pass back return data whose length is not yet known when the call is made).

In summary, the called contract is executed essentially as if it were the initial contract execution of the underlying transaction. Calls can of course be nested, so we now see that a transaction should be considered as the top-level call, which can be followed by a number of nested calls (actually, this number is limited, for instance by the limited depth of the call stack).

Of course, executing an unknown contract can be a significant security risk. We have seen an example in our post on smart contract security, where a malicious contract calling back into your own contract can cause a double-spending. Therefore, it is natural to somehow try to restrict what a called contract can due. One of the first restrictions of this type is the introduction of the STATICCALL with EIP-214. A static call is very much like an ordinary call, except that the called contract is not allowed to make any state changes, in particular no value transfer is possible as part of a static call.

The function opStaticCall realizing this is actually very similar to the processing of an ordinary call. There are two essential differences. First, there is no value and therefore one parameter less that needs to be taken from the stack. Second, the method of the EVM that is eventually invoked is not Call but StaticCall. The structure of this function is very similar to that of an ordinary call, so let us focus on the differences. Here is a short snipped (leaving out some parts to focus on the differences) of the Call method.

evm.Context.Transfer(evm.StateDB, caller.Address(), addr, value)
code := evm.StateDB.GetCode(addr)
contract := NewContract(caller, AccountRef(addrCopy), value, gas)
contract.SetCallCode(&addrCopy, evm.StateDB.GetCodeHash(addrCopy), code)
ret, err = evm.interpreter.Run(contract, input, false)

And here is the corresponding code for a static call (again, I have made a few changes to better highlight the differences).

addrCopy := addr
code := evm.StateDB.GetCode(addr)
contract := NewContract(caller, AccountRef(addrCopy), new(big.Int), gas)
contract.SetCallCode(&addrCopy, evm.StateDB.GetCodeHash(addrCopy), code)
ret, err = evm.interpreter.Run(contract, input, true)

So we see that there are three essential differences. First, in a static call, there is value transfer – this is as expected, as a static call is not allowed to make a value transfer which represents a change to the state. Second, when we build the contract, the third parameter is zero – again, this is related to the fact that there is no value transfer, as this parameter determines the value that, for instance, the opcode CALLVALUE returns. Finally, we set the third parameter of the Run function to true. In our discussion of the Run method in the previous post, we have already seen that this disallows all instructions which are marked as state changing.

Delegation and the proxy pattern

Apart from calls and static calls, there is a third way to invoke another contract, namely a delegate call. Roughly speaking, a delegate call implies that instead of executing the code of the called contract within the context of the called contract, we execute the code within the context of the caller. Thus, we essentially run the code of the called contract as if it were a part of the caller code, as you would run a library (however, this is of course not how libraries are actually realized in Solidity where a library is simply linked into the contract at build time).

In the EVM, a delegate call is done using the opcode DELEGATECALL (well, that did probably not come as a real surprise). Similar to a static call, there is no value transfer for this call and correspondingly no value parameter on the stack. Going through the same analysis as for a static call, we find that execution of the opcode delegates to the method DelegateCall() of the EVM. Let us again look at the parts of the code that differ from an ordinary call.

addrCopy := addr
code := evm.StateDB.GetCode(addr)
contract := NewContract(caller, AccountRef(caller.Address()), nil, gas).AsDelegate()
contract.SetCallCode(&addrCopy, evm.StateDB.GetCodeHash(addrCopy), code) 
ret, err = evm.interpreter.Run(contract, input, false)

Looking at this , we spot three differences compared to an ordinary call. First, the second parameter used for the creation of the new contract (which is the parameter which will determine the self field of the new contract and with that the address used to read and change state during contract execution) is not set to the target contract, but to the address of the caller, i.e. the currently executing contract, while the address used to determine the code to be run is still that of the target contract. Thus, as promised, we execute the code of the target contract within the context of the currently executing contract.

A second difference is the third argument used for contract creation, which is the value transferred with this call. Again, this is zero (even nil). Finally, after creating the contract, we execute its AsDelegate() method. This changes the attributes CallerAddress and value of the contract to that of the currently executing contract. Thus, whenever we execute the opcodes CALLVALUE or CALLER, we get the same values as in the context of the currently executing contract, as promised by EIP-7, the EIP which introduced delegate calls.

One of the motivations behind introducing this possibility was that it allows for a pattern known as proxy pattern. In this pattern, there are two contracts involved. First, there is the proxy contract. The proxy contract accepts a call or transaction and is responsible for holding the state. It does, however, not contain any non-trivial logic. Instead, it uses a delegate call to invoke the logic residing in a second contract, the logic contract.

Why would you want to do this? There are, in fact, a couple of interesting use cases for this pattern. First, it allows you to build an upgradeable contract. Recall that – at least until the CREATE2 opcode was introduced – it was not possible to change a smart contract after is has been deployed. Even though this is of course by intention and increases trust in a smart contract (it will be the same, no matter when you interact with it), it also implies a couple of challenges, most notably that it makes it impossible to add features to a smart contract over time or to fix a security issue. The proxy pattern, however, does allow you to do this. You could, for instance, store the address of the logic contract in the proxy contract instead of hard-coding it, and then add a method to the proxy that allows you to change that address. You can then deploy a new version of the logic to a new address and then update the address stored in the proxy contract to migrate from the old version to the new version. As the state is part of the proxy contract which stays at its current location, the state will be untouched, and as the address that the users interact with does not change, the users might not even notice the change. Needless to say that this is very useful for some cases, but can also be abused by tricking a user into trusting a contract and then changing its functionality, so be careful when interacting with a smart contract that performs delegation.

A second use case is related to re-use. As an example, suppose you have developed a smart contract that implements some useful wallet-like functionality, maybe time-triggered transfers. You want to make this available to others. Now you could of course allow anybody to deploy your smart contract, but this would lead to many addresses on the blockchain containing exactly the same code. Alternatively, you could store the logic in one logic contract and than only distribute the code for the proxy. A new user would then simply deploy a proxy, so each proxy would act as a wallet with an individual state and balance, but all of them would run the same logic. Again, it goes without saying that this implies that your users trust you and your contract – if, for instance, your logic contract is able to remove itself (“self-destruct” using the corresponding opcode), than this would of course render all deployed proxies useless and the balance stored in them would be lost forever.

Finally (and this apparently was one of the motivation behind EIP-7) you could have a large contract whose deployment consumes more gas than the gas limit of a block allows. You could then split the logic into several smaller logic contracts and use a proxy to tie them together into a common interface.

There are several ongoing attempts to standardize this pattern and in particular upgradeable contracts. EIP-897, for instance, proposes a standard to expose the address to which a proxy is pointing. EIP-1967 addresses an interesting problem that the pattern has – the logic contract and the proxy contract share a common state, and thus the proxy contract needs to find a way to store the address of the logic contract without conflicting with the storage layout of the logic contract. Finally, EIP-1822 proposes a standard for upgradeable contracts. It is instructive to read through these EIPs and I highly advise you to do so and also have a look at the implementations described or linked in them.

Gas handling during a call

Let us now turn to gas handling during a call. We have already seen that, as for every instruction, there is a constant gas cost and a dynamic gas cost. In addition, there are two special contributions which are not present for other instructions – a refund and a stipend.

The constant gas cost is simple – this is simply a constant value of (currently) 700 units of gas, increased from previously 40 with EIP-150. The dynamic gas cost is already a bit more complicated and itself consists of four positions. The first three positions are rather straightforward

first, there is a fee of 9000 units of gas when a non-zero value is transferred as part of the call
second, there is an account creation fee of 25000 whenever a non-zero value is transferred to a non-existing account as part of the call
third, there is the usual gas fee for memory expansion, as for many other instructions

The fourth contribution to the dynamic gas cost is a bit more tricky. The problem we are facing at this point is that the contract which is called will of course consume gas as well, but at this point in time, we do not know how much this is going to be. To solve this, a position called the gas cap is used. Initially, this gas cap was simply the first stack item, i.e. the first argument to the CALL instruction, which specifies the gas limit for the contract to be executed, i.e. the part of our remaining gas that we want to pass on to the called contract. We could now simply use this number as additional gas cost and then, once the called contract returns, see how much of that is still unused and refund that amount.

This is indeed how the gas payment for a call worked before EIP-150 was introduced. This EIP was drafted to address denial-of-service attacks that utilized the fact that the costs for some instructions, among them making a call, was no longer reflecting the actual computing cost on the client. As a counter-measure, the cost for a call was increased from previously 40 to the new still valid 700. This, however, caused problems with existing contract that tried to calculate the amount of gas they would make available to called contract by taking the currently remaining gas (inquired via the GAS opcode) and subtracting the constant fee of 40 units of gas. To avoid this, the developers thought about coming up with a mechanism which allowed a contract to make “almost all” remaining gas available to the caller, without having to hard-code gas fees. More precisely, “almost all” means that the following algorithm is applied to calculate the gas cap.

Determine the gas which is currently still available, after having deducted the constant gas cost already
Determine the base fee, i.e. the dynamic gas cost for the call calculated so far (memory fee, transfer fee and creation fee)
Subtract this from the remaining gas to determine the gas which will still be available after paying for all other gas cost contributions (“projected available gas”)
Read out the first value from the stack (the first parameter of the GAS instruction), i.e. the requested gas limit
determine a gas cap as 63 / 64 times the projected available gas
if the requested gas limit is higher than the gas cap, return the gas cap, otherwise return the requested gas limit

Thus a contract can effectively pass almost all of the remaining gas to the callee by providing a very large requested gas limit as first argument to the CALL instruction, so that the requested gas limit is definitely smaller than the calculated cap. The factor of 63 / 64 has been put in as an additional protection against recursive calls. The outcome of this algorithm is then used for two purposes – as an upfront payment to cover the maximum amount of gas that the callee might need, and as the gas supply that the callee actually obtains for its execution.

Now, I have been cheating a bit as there are two components in the diagram above that we have not yet discussed. First, I have just told you that the outcome of the EIP-150 algorithm is passed as available gas to the callee. This, however, is only true if the call does not transfer any Ether. If it does, there is an additional stipend of 2300 gas which is added to the gas made available to the callee before actually making the call. Note that this stipend does not count against the gas cost of the callee, as it is not part of the dynamic gas cost, so it effectively has two implications – it reduces the cost of the call by 2300 units of gas and, at the same time, it makes sure that even if the caller specified zero as gas limit for the call, the callee has at least 2300 units of gas available. The motivation of this is that a call with a non-zero value typically triggers the receive function or fallback function of the called contract, and calls with a gas supply of zero will let this function fail. Thus the gas stipend serves as a safe-guard to reduce the risk of a value transfer failing because the recipient is a smart contract and its receive- or fallback-function runs out of gas.

Finally, there is the refund, which happens here and simply amounts to adding the gas that the callee has not consumed to the available gas of the current execution context again.

The gas stipend and transfers in Solidity

The gas stipend is one of the less documented features of smart contracts, and a part of the confusion that I have seen around this topic (which, in fact, was the main motivation for the slightly elaborated treatment in this post) comes from the fact that a gas stipend exists in the EVM as well as in Solidity.

As explained above, the EVM adds the gas stipend depending on the value transferred with the call – in fact, the stipend only applies to calls with a non-zero value. In addition to this, Solidity applies the same logic, but only if the value is zero. To see this, you might want to use a simple contract like this one.

contract Transfer {

    uint256 value;

    function doIt() public {
        payable(msg.sender).transfer(value);
    }
}

If you compile this, for instance in Remix, and take a look at the generated bytecode, you will see that eventually, the transfer translates into a CALL instruction. The preparation of the stack preceding this instruction is a bit involved, but if you go through this carefully and wait until the dust has settled, you will find that the top of the stack looks as follows.

(value == 0) * 2300 | sender | value |

Thus the first value, which specified the gas to be made available for the subcontract, is 2300 (the gas stipend) if the value is zero, and zero otherwise. In the first case, the EVM will not add anything, in the second case, the EVM will add its own gas stipend. Thus, regardless of the value, the net effect will be that the gas stipend of 2300 units of gas always applies for a transfer. You might also want to look at this snippet in the Solidity source code that creates the corresponding code (at least if I interpret the code correctly).

What this analysis tells us as well is that there is no way to instruct the compiler to increase the gas limit of the transfer. As the 2300 units of gas will only be sufficient for very simple functions, you need a different approach when invoking contracts with a more complex receive function. When we discuss NFTs in a later post in this series, we will see how you can use interfaces in Solidity to easily call functions of a target contract. Alternatively, to simply invoke the fallback function or the receive function with a higher gas limit, you can use a low-level call. To see this in action, change the transfer in the above sample code to

(bool success, ) = 
     payable(msg.sender).call{value: value}("");

When you now compile again, take a look at the resulting bytecode and locate the CALL instruction, you will see that immediately before we do the CALL, we execute the GAS opcode. As we know, this pushes the remaining available gas onto the stack. Thus the first argument to the CALL is the remaining gas. As, by the EIP-150 algorithm above, this is in every case more than the calculated cap, the result is that the cap will be used, i.e. almost all remaining gas will be made available to the called contract. Be sure, however, to check the return value and handle any errors that might have occurred in the called contract, as Solidity does not add extra code to make sure that we revert upon errors. Note that there is an ongoing discussion to extend the functionality of transfer in Solidity to allow a transfer to explicitly pass on all the remaining gas, see this thread.

With this, we have reached the end of our post for today. In this and the previous two posts, we have taken a deep-dive into how the Ethereum virtual machine actually works, guided by the yellow paper and the code of the Go-Ethereum client. In the next post, we will move on and start to explore one of the currently “hottest” applications of smart contract – non-fungible token. Hope to see you soon!

Understanding the Ethereum virtual machine – part II

In todays post, we will complete our understanding of how the EVM executes a smart contract. We will investigate the actual interpreter loop, discuss gas handling and have a short look at pre-compiled contracts.

The jump table and the main loop

In the last post, we have seen that the entry point to the actual code execution is the Run method of the EVM interpreter. This method is essentially going through the bytecode step by step and, for each opcode, looking up the opcode in a data structure known as jump table– Among other things, this table contains a reference to a Go function that is to be executed to process the instruction. More specifically, an entry in the jump table contains the following fields, which partially refer to other tables in other source code files.

First, there is a Go function which is invoked to process the operation
Next, there is a gas value which is known as the constant gas cost of the operation. The idea behind this is that the gas cost for the execution of an instruction typically has two parts – a static part which is independent of the parameters and a dynamic part which depends on parameters like the memory consumption or other parameters. This field represents the static part
The third field is again a function that can be used to derive the dynamic part of the gas cost
The fourth field – minStack – is the number of stack items that this operation expects
The next field – maxStack – is the maximum size of the stack that will still allow this operation to work without overflowing the stack. For most operations, this is simply the maximum stack size minus the number of items that the operation pops from the stack plus the number of items that it adds to the stack
The next field, memorySize, specifies how much memory the opcode needs to execute. Again, this is a function, as the result could depend on parameters
The remaining fields are a couple of flags that describe the type of operation. The flag halts is set if the operation ends the execution of the code. At the time of writing, this is set for the opcodes STOP, RETURN and SELFDESTRUCT.
Similarly, the reverts flag indicates whether this opcode explicitly reverts the execution and is currently only set for the REVERT opcode itself
The return flag indicates whether this opcode returns any data. This is the case for the call operations STATICCALL, DELEGATECALL, CALL, and CALLCODE, but also for REVERT and contract creation via CREATE and CREATE2
The writes flag indicates whether the operation modifies the state and is set of operations like SSTORE
Finally, the jumps flag indicates whether the operation is a jump instruction and therefore modifies the program counter

Another data structure that will be important for the execution of the code is a set of fields known as the call context. This refers to a set of variables that make up the current of the interpreter and are reset every time a new execution starts, like memory, stack and the contract object.

Let us now go through the Run method step by step and try to understand what it does. First, it increments the call stack depth which will be decremented again at the end of the function. It also sets the read only flag of the interpreter if not yet done and resets the return data field. Next, we initialize the call context and set the program counter to zero before we eventually enter a loop called the main loop.

Within this loop, we first check every 1000th step whether the abort flag is set. If yes, we stop execution (my understanding is that this feature is primarily used to cancel running EVM operations that were started as part of an API call). Next, we use the current value of the program counter to read the next opcode that we need to process, and look up that operation in the jump table (raising an error if there is no entry, which indicates an invalid opcode).

Once we have the jump table entry in our hands, we can now check the current stack size against the minimum and maximum stack size of the instruction and make sure that we raise an error if we try to process an operation in read-only mode that potentially updates the state.

We then calculate the gas required to perform the operation. As already explained, the gas consumption has two parts – a static part and a dynamic part. For each of these two contributions, we invoke the method UseGas() of the contract object, which will reduce the gas left that the contract tracks and also raise an error if we are running out of gas.

We then execute the operation by invoking the Go function to which it is mapped. This function will typically get some data from the stack, perform some calculations and push data back to the stack, but can also modify the state and perform more complex operations. Most if not all operations are contained in instructions.go, and it is instructive to scan the file and look at a few operations to get a feeling for how this works (we will go through a more complex example, the CALL operation, in a later post).

Once the instruction completes, we check the returns flag of the jump table entry to see whether the instruction returns any data, and if yes, we copy this data to the returnData field of the interpreter so that it is available for the next instruction. We then decide whether the execution is complete and we need to return to leave the main loop, or whether we need to continue execution with an updated program counter.

So the main loop is actually rather straightforward, and, together with our discussion of the Call() method in the previous post, we now have a fairly complete picture of how contract execution works.

Handling gas consumption

Let us leverage this end-to-end view to put together the various bits and pieces to understand how gas consumption is handled. We start our discussion on the level of an entire block. In one of the previous posts, we have already seen that when a block is processed here, two gas related variables are maintained. First, the processing keeps track of the gas used for all transactions in this block, which corresponds to the gasUsed field of a block header. In addition, there is a block gas pool, which is simply a counter initialized with the current block gas limit and used to keep track of the gas which is still available without violating this limit.

When we now process a single transaction contained in the block, we invoke the function applyTransaction. In this function, we increase the used gas counter on the block level by the gas consumed by the transaction and use that information to create the transaction receipt, that contains both the gas used by the transaction and the current value of the cumulative gas usage on the block level. This is done based on the return value of the ApplyMessage function, which itself immediately delegates to the TransitionDB method of a newly created state transition object.

The state transition object contains two additional gas counters. The first counter (st.gas) keeps track of the gas still available for this transaction, and is initialized with the gas limit of the transaction, so this is the equivalent of the gas pool on the block level. The second counter is the initial value of this field and only used to be able to calculate the gas actually used later on.

When we now process the transaction, we go through the following steps.

First, we initialize the gas counters
Then, we deduct the upfront payment from the senders balance. The upfront payment is the gas price times the gas limit and therefore the maximum amount of Ether that the sender might have to pay for this transaction
Similarly, we reduce the block gas limit by the gas limit of the transaction
Next, we calculate the intrinsic gas for the transaction. This is the amount of gas just for executing the plain transaction, still without taking any contract execution into account. It is calculated (ignoring contract creations) by taking a flat fee of currently 21000 units of gas per transaction, plus a fee for every byte of the transaction input (which is actually different for zero bytes and non-zero bytes). In addition, there is a fee for each entry in the access list (this is basically a list of accounts and addresses for which a discount applies when accessing them, see EIP-2930). In the yellow paper, the intrinsic gas is called g₀ and defined in section 6.2
We then reduce the remaining gas by the intrinsic gas cost (again according to what section 6.2 of the yellow paper prescribes) and invoke Call(), using the remaining gas counter st.gas as the parameter which determines the gas available for this execution. Thus the gas available to the contract execution is the gas limit minus the intrinsic gas cost. We have already seen that this creates a Contract containing another gas counter which keeps track of the gas consumed during the execution. Within the interpreter main loop, we calculate static and dynamic gas cost for each opcode and reduce the counter accordingly. At the end, the remaining gas is returned
We update the remaining gas counter st.gas with the value returned by Call(). We then perform a refund, i.e. we the remaining gas times gas price back to the sender and also put the remaining gas back into the gas pool on block level

This has a few interesting consequences. First, it demonstrates that the total gas cost of executing a transaction does actually consist of two parts – the intrinsic gas for the transaction and the cost of executing the opcodes of the smart contract (if any). Both of these components have a static part (the 21000 base fee for the intrinsic gas cost and the static fee per opcode for the code execution) and a dynamic part, which depends on the transaction.

The second thing that you want to remember is that in order to make sure that a transaction is processed, it is not sufficient to have enough Ether to pay for the gas actually used. Instead, you need to have at least the gas limit times the gas price, otherwise the upfront payment will fail. Similarly, you need to make sure that the gas limit of your transaction is lower than the block gas limit, otherwise the block will not be mined.

Pre-compiled contracts

There is a special case of calling a contract that we have ignored so far – pre-compiled contracts. Before diving down into the code once more, let me quickly explain what pre-compiled contracts are and why they are useful.

Suppose you wanted to develop a smart contract that needs to calculate a hash value. The EVM has a built-in opcode SHA3 to calculate the Keccak hash, but what about other hashing algorithms? Of course, as the EVM is Turing-complete, you could develop a contract that does this, but this would blow up your contract considerably and, in addition, would probably be extremely slow as this would mean executing complex mathematical operations in the EVM. As an alternative, the Ethereum designers came up with the idea of a pre-compiled contract. Roughly speaking, this is a kind of extension of the instruction set of the EVM, realized as contracts located at pre-defined addresses. The contract at address 0x2, for instance, calculates an SHA256 hash, and the contract at address 0x3 a RIPEMD-160 hash. These contracts are, however, not really placed on the blockchain – if you look at the code at this address using for instance the JSON API method eth_getCode, you will not get anything back. Instead, these pre-defined contracts are handled by the EVM. If the EVM processes a CALL targeting one of these addresses, it does not actually call a contract at this address, but simply runs a native Go function that performs the required calculation.

We have already seen where in the code this happens – when we initialize the target contract in the Call() method of the EVM, we check whether the target address is a pre-compiled contract and, if yes, execute the associated Go function instead of running the interpreter. The return values are essentially the same as for an ordinary call – return data, an error and the gas used for this operation.

The pre-compiled contracts as well as the gas cost for executing them are defined in the file contracts.go. At the time of writing, there are nine pre-compiled contracts, residing (virtually) at the addresses 0x1 to 0x9:

EC recover algorithm, which can be used to determine the public key of the signer of a transaction
SHA256 hash function
RIPEMD-160 hash function
the data copy function, which simply returns the input as output and can be used to copy large chunks of memory more efficiently than by using the built-in opcodes
exponentation module some number M
three elliptic curve operations to support zero-knowledge proofs (see the EIPs 196 and 197)
the BLAKE2 F compression function hash function (see EIP-152)

Here is the final flow diagram for the smart contract execution that now also reflects the special case of a pre-compiled contract.

With this, we close our post for today. In the next post, we will take a closer look at the CALL opcode and its variations to understand how a smart contract can invoke another contract.

Understanding the Ethereum virtual machine – part I

In todays post, we will shed some light on how the Ethereum virtual machine (EVM) actually works under the hood. We start with an overview of the most relevant data structures and methods and explain the big picture before we look at the interpreter main loop in the next post.

The Go-Ethereum EVM – an overview

To be able to analyze in depth what really happens if a specific opcode is executed, it is helpful to take a look at both the yellow paper and the source code of the Go-Ethereum (geth) client implementing what the yellow paper describes. The code for the EVM is in this folder (I have used version 1.10.6 for the analysis, but the structure should be rather stable across releases).

Let us first try to understand the data structures involved. The diagram below shows the most important classes, attributes and methods that we need to understand.

First, there is the block context. This class is simple, it simply contains some data fields that represent attributes of the block in which the transaction is located and is used to realize opcodes like NUMBER or DIFFICULTY. Similarly, the transaction context (TxContext) holds some fields of the transaction as part of which we execute the smart contract.

Let us now turn to the Contract class. The name of this class is a bit misleading, as it does in fact not represent a smart contract, but the execution of a smart contract, either as the result of a transaction or, more generally, of a message call. Its most important attributes (at least for our present discussion) are

The code, i.e. the smart contract code which is actually executed
the input provided
the gas available for the execution
the address at which the smart contract resides (self)
the address of the caller (caller and CallerAddress)

It is important to understand the meanings of the various addressed contained in this structure. First, there is the self attribute, which is the contract address, i.e. the address at which the contract itself resides. This is the address which is called I_a in the yellow paper, which is returned by the ADDRESS opcode and which is the address holding the state manipulated by the code, for instance when we run an SSTORE operation. This is also the address returned by the Address() method of the contract.

Next, there is the caller and the callerAddress. In most cases, these two addresses are identical and represent the source of the message call, i.e. what is called the sender I_sin the yellow paper. There are cases, however, namely so called delegate calls, where these address are not identical. We will come back to this in the next post.

The contract object also maintains the gas available for the execution. This field is initialized when the the execution starts and can then be reduced by calling UseGas() to consume a certain amount of gas.

Next, there is the EVM itself. The EVM refers to a state (StateDB), a transaction context and a block context. It also holds an attribute abort which can be set to abort the execution, and a field callGasTemp which is used to hold the gas value in some cases, we will see this field in action later.

Finally, there is the EVM interpreter. The interpreter is doing all the hard work of running a piece of code. For that purpose, it references a jump table which is essentially a list of opcodes together with references to corresponding Go functions that need to be run whenever this opcode is encountered. The interpreter also maintains the scope context which is a structure bundling the data that is refreshed with every execution of a smart contract – the content of the memory, the content of the stack and the contract execution, represented by a contract object.

Code execution in the yellow paper

Before we move on to understand how the code execution actually works, let us take a short look at the yellow paper, in particular sections 6, 8 and 8 describing contract execution, and try to map the data structures and functions described there to the part of the source code that we have just explored.

The central function that describes the execution of a contract code in the yellow paper is a function denoted by a capital Theta (Θ) in the yellow paper. This function has the following arguments.

the state on which the code operates
the sender of the message call or transaction
the origin of the transaction (which is always an EOA and the address which signed the transaction)
the recipient of the message call
the address at which the code to be executed is located (this is typically the same as the recipient, but might again differ in the case of delegated calls)
the gas available for the execution
the gas price
the value to be transferred as part of the message call (again, there is a subtlety for delegate calls that we postpone to the next post)
the input data of the message call
the depth of the call stack
a flag that can be used to prevent the transaction from making any changes to the state (this is required for the STATICCALL functionality)

If you compare this list with the data structures displayed above, you will find that this is essentially the combination of the EVM attributes, the transaction context, the scope context and the contract execution object. All this data is tied together in the EVM class, so it is natural to assume that the function Θ itself is realized by a method of this class – in fact, this is the Call method that we will look at in the next section.

The output of Θ is the updated state, the remaining gas, an object known as accrued substate that contains touched and destroyed accounts, the logs generated during the execution and the gas to be refunded.

The inner workings of Θ are described in section 8 of the yellow paper, First, the value to be transferred is deducted from the balance of the sender and added to the balance of the recipient. Then, the actual code is executed – this happens by calling another function denoted by Ξ (a capital greek xi) – again, there is an exception to this rule for pre-compiled contracts that we discuss in the next post. If the execution is not succesful, then the state is reset to the its previous value, if it is successful, the state returned by Ξ is used. The function Ξ is again not terribly to identify in the source code – it is the method Run() of the EVM interpreter which will be the subject of the next post.

The call method of the EVM

Let us now take a closer look at the method Call() of the EVM which implements what the yellow paper calls Θ. The source code for this method can be found here. For today, I will ignore pre-compiled contracts completely which we will discuss in the next post.

The method starts by running a few checks, like making sure that we do not exceed the call depth limit (which is defined to be 1024 at the moment) or that we do not attempt to transfer more than the available balance.

The next step is to take a snapshot of the current state. Internally, Go-Ethereum uses revisions to keep track of different versions of the state, and taking a snapshot simply amounts to remembering a revision to which we can revert later if needed.

Next, we check whether the contract address already exists. This might be a bit confusing, as it does not seem to make sense to call a contract at a non-existing address, or, more precisely, at an address not yet initialized in the state DB. Note, however, that “calling” means a bit more general “sending a message to an account”, which is also done if you simply want to transfer Ether to an account. Sending a message to a non-contract account is perfectly valid, and it might even be that this account has never been used before and is therefore not part of the cached state.

The next step is to actually perform the transfer of any Ether involved in the message call, i.e. we send value Wei from the sender to the recipient. We then get into the actual bytecode execution by performing the following steps.

get the code associated with the contract address (i.e. the runtime bytecode) from the state
if the length of the code is zero, return – there is nothing left to be done
initialize a new Contract object that represents the current execution.
initialize the contract code
call the Run method of the interpreter

We then collect the return value from the Run method and a potential error code and set gas to contract.Gas – this represents the gas still remaining after executing the code. We then determine the final return values according to the following logic.

If Run did not result in an error, return the return value, error code and remaining gas just assembled
If Run returned a special error code indicating that the execution was reverted, reset the state to the previously created snapshot
If the error code returned by Run is not a reverted execution, also fall back to the snapshot but in addition, set the remaining gas to zero, i.e. such an error will consume all the available gas

Invocations of the call method

Having understood how Call works, we are now left with two tasks. First, we need to understand how the EVM interpreters Run method works, which will be the topic of our next post. Second, we have to learn where Call is actually invoked within the Go-Ethereum source code.

Not quite surprisingly, this happens at several points (ignoring tests). First, in a previous post, I have already shown you that the EVM’s Call method is invoked whenever a transaction is processed as part of a state transition. This call happens here, and the parameters are as we would expect – the the caller is the sender of the transaction, the contract address is the recipient, and the input data, gas and value are taken from the StateTransition object. The remaining gas returned is again stored in the state transition object and used as a basis for computing the gas refunded to the sender. Note that this entry point is (via the ApplyMessage function) also used by the JSON API when the eth_call method or the eth_estimateGas method are requested.

However, this is not the only point in the code where we find a reference to the Call method. A second point is actually the EVM interpreter itself, more precisely the function opCall in instructions.go. The background of this is at in addition to a call due to a transaction, i.e. a call initiated by an EOA, we can of course also call a smart contract from another smart contract using the CALL opcode. This opcode is implemented by the opCall function, and it turns out that it uses the EVM Call method as well. In this case, the parameters are taken from the stack respectively from the memory location referenced by the stack items.

the top level item on the stack is the gas that is made available (as we will see in the next post, this is not exactly true, but almost)
the next item on the stack is the target address
the third item is the value to be transferred
the next two items determine offset and length of the input data which is taken from memory
the last two items similarly determine offset and length of the return data area in memor

It is interesting to compare the handling of the returned error code. First, it is used to determine the status code that is returned. If there was an error, the status code is set to zero, otherwise it is set to one. Then, the returned data is stored in memory in case the execution was successful or explicitly reverted, for other errors no return data is passed. Finally, the unused gas is again returned to the currently executing contract.

This has an important consequence – there is no automatic propagation of errors in the EVM! If a contract A calls a contract B, and contract B reverts (either explicitly or due another error), then the call will technically go through, and contract A does not automatically revert as well. Instead, you will have to explicitly check the status code that the CALL opcode puts on the stack and handle the case that contract B fails somehow. Not doing this will make your contract vulnerable to the “King of the Ether” problem that we have discussed in my previous post on contract security.

Finally, scanning the code will reveal that there is a third point where the Call method is invoked – the EVM utility that allows you to run a specified bytecode outside of the Go-Ethereum client from the command line. It is fun to play with this, here is an example for its usage to invoke the sayHello method of our sample contract (again, assuming that you have cloned my repository for this series and are working in the root directory of the repository). Note that in order to install the evm utility, you will have to download the full geth archive, containing all the tools, and make the evm executable available in a folder in your path.

VERSION=$(python3 -c 'import solcx ; print(solcx.get_solc_version())')
DIR=$(python3 -c 'import solcx ; print(solcx.get_solcx_install_folder())')
SOLC="$DIR/solc-v$VERSION"
CODE=$($SOLC contracts/Hello.sol --bin-runtime   | grep "6080")
evm \
  --code $CODE\
  --input 0xef5fb05b \
  --debug run

This little experiment completes this post. In the next post, we will try to fill up the missing parts that we have not yet studied – how the code execution, i.e. the Run method, actually works, what pre-compiled contracts are and how gas is handled during the execution. We will also take a closer look at contract-to-contract calls and its variations.

A deep-dive into Solidity – function selectors, encoding and state variables

In the last post, we have seen how the Solidity compiler creates code – the init bytecode – to prepare and deploy the actual bytecode executed at runtime. Today, we will look at a few standard patterns that we find when looking at this runtime bytecode.

Some useful tools

While analyzing the init bytecode in the last post, we have mainly worked with the output of the Solidity compiler known as opcode listing – the output generated when we supply the –opcode switch. One major drawback of this representation of the bytecode is that we had to manually count instructions to determine the target of a JUMP instruction. Before going deeper into the runtime bytecode of our sample contract, let us collect a few tools that can help us with this.

First, there is the Solidity compiler itself. In addition to the bytecode and the opcodes, it can also generate an enriched output known as assembly output when the –asm switch is used. To do this for our sample contract, run

VERSION=$(python3 -c 'import solcx ; print(solcx.get_solc_version())')
DIR=$(python3 -c 'import solcx ; print(solcx.get_solcx_install_folder())')
SOLC="$DIR/solc-v$VERSION"
$SOLC contracts/Hello.sol --asm --optimize

The output is a mixture of opcodes and statements combining several opcodes into one. The snippet

PUSH1 0x40
PUSH1 0x80
MSTORE

for instance, is displayed as

mstore(0x40, 0x80)

In addition, and that makes this representation very useful, offsets are tagged, so that it becomes much easier to identify jump targets.

Brownie does also offer some useful features to display opcodes of a smart contract. When Brownie compiles a contract, it stores build data in the build subdirectory, and the data in this subdirectory can also be accessed using Python code. In particular, we can access the full bytecode and the runtime bytecode of a compiled contract, like this.

// including init bytecode
project.TmpProject._build.get("Hello")['bytecode']
// runtime bytecode only
project.TmpProject._build.get("Hello")['deployedBytecode']

Alternatively, we can access the bytecode from the deployed contract.

me = accounts[0]
hello = Hello.deploy({"from": me})
// runtime bytecode
hello.bytecode
// full bytecode (input of deployment transaction)
hello.tx.input

In addition to the plain bytecode, Brownie also offers a data structure which contains the opcodes along with offsets and some additional useful information – the pcMap. This is a hash map where the keys are the offsets of the opcodes into the runtime bytecode (the pcMap contains only the runtime bytecode) and the values are again hash maps containing the name of the Solidity function to which the code belongs, the opcode itself and arguments to the opcode as far as applicable. To print this map in a readable format, you can use the following statements.

pcMap = project.TmpProject._build.get("Hello")['pcMap']
for i in sorted(pcMap.keys()):
  print(i, "-->", pcMap[i]);

The pcMap is particularly useful if we combine it with another feature that Brownie has to offer – tracing transactions. A transaction trace contains the exact opcodes executed as part of the transaction. Here is an example.

tx = hello.sayHello()
tx.call_trace()
tx.trace

So the call trace is just a stack trace, while the trace is an array whose entries represent the opcodes that have actually been executed, along with information like the gas cost of the step, the memory content before the step was executed and the stack and storage content before the step was executed. Using tx.source(), we can even get the source code that belongs to a trace step.

The Remix IDE has a similar capability. Once a transaction has been executed and is displayed on the main screen, you can click on the blue “Debug” icon next to the transaction, and a debugger window will open on the left of the screen. You can now step forward and back, inspect opcodes, stack, memory and storage and even set breakpoints. In the Remix IDE, you can even debug deployment transaction, which is not possible in Brownie.

Function selectors

Having all these tools at our disposal, it is now not terribly difficult to understand the actual runtime bytecode. Here is a list of the opcodes, along with a few comments and tags.

// This is the start of the runtime bytecode
// initialize free memory pointer
PUSH1 0x80 
PUSH1 0x40 
MSTORE 
// Repeat the check for a non-zero value
CALLVALUE 
DUP1 
ISZERO 
PUSH1 0xF
// conditionally jump to target 1 
JUMPI 
PUSH1 0x0 
DUP1 
REVERT 
// This is jump target 1. We get here only
// if the value is zero
JUMPDEST 
POP 
PUSH1 0x4 
CALLDATASIZE 
LT 
PUSH1 0x28 
JUMPI // conditional jump to jump target 2
// We only get here if we have at least four bytes
// of data
PUSH1 0x0 
CALLDATALOAD 
PUSH1 0xE0 
SHR 
DUP1 
PUSH4 
0xEF5FB05B 
EQ 
PUSH1 0x2D 
JUMPI 
// This is jump target 2
JUMPDEST 
PUSH1 0x0 
DUP1 
REVERT 
// This is jump target 3, here we enter
// the sayHello function
JUMPDEST 
PUSH1 0x33  // offset of jump target 4
PUSH1 0x35  // offset of jump target 5
JUMP 
// This is jump target 4
JUMPDEST 
STOP 
// This is jump target 5 
// The code starting here is the actual sayHello function
JUMPDEST 
PUSH1 0x40 
MLOAD 
PUSH32 0x3ACB315082DEA2F72DFEEC435F2B0E4DD95A4FD423E89C8CB51DC75FA38D7961 
SWAP1
PUSH1 0x0
SWAP1
LOG1 
JUMP

I have stripped off a few opcodes at the end which we will take about a bit later. Let us go through the code line by line and try to understand what it does.

The first three lines are familiar – we again initialize the free memory pointer which Solidity stores at memory address 0x40 to its initial value 0x80. Similary, we have already seen the next lines, starting with CALLVALUE, while analyzing the init bytecode. This code again checks that the value of the transaction is zero and reverts if this is not the case, reflecting the fact that our contract does not have a payable function. If the value is zero, the processing continues at the point in the code that I have called jump target 1.

Here, we first clean up the stack by popping the last value. We then push four onto the stack, followed by the output of CALLDATASIZE, which is the length of the transaction input field. The LT opcode compares these two values and pushes the result of the comparison onto the stack. If the result of the comparison is true, i.e. if we have less than four bytes in the input field, we jump to jump target 2, where we again revert.

To understand why this code makes sense, recall that the first four bytes of the input field are supposed to be the hash of the signature of the function we want to call. If we have less than four bytes, the call is not targeting a function, and as we do not have a fallback function, we revert.

If we have at least four bytes of data, we continue at the next line, where we first push zero onto the stack and then run CALLDATALOAD, which loads the first full 32 byte word of the call data onto the stack (the zero that we have just pushed is the offset). We then execute the set of instructions

PUSH1 0xE0 // 0xE0 is 224 
SHR
DUP1
PUSH4 0xEF5FB05B
EQ

This looks a bit mysterious, but is actually not too difficult to understand. After the first push, our stack looks as follows.

| 224 | first 32 bytes of transaction input |

When we then execute SHR, which is a shift operation to the right, we shift the second item on the stack by the number of bits specified by the first item to the right, so we shift the 32 bytes, i.e. 256 bits, by 224 bits to the right. This amounts to moving the first 32 bytes to the rightmost position, so that what we now have on the stack are the first four bytes of the input data, i.e. exactly those four bytes that contain the hash of the function signature. We then push four bytes on the stack, so that our stack is now

| 0xEF5FB05B | first four bytes of the function signature |

and use EQ to compare them, so that stack item at the top of the stack is now

first four bytes of function signature == 0xEF5FB05B

Now open Brownie and run

web3.keccak(text="sayHello()")[:4]

to convince yourself that the four bytes to which we compare are exactly the hash of “sayHello()”. Thus, we execute the conditional jump that comes next only if the first four bytes of the input data indicate that we want to call this method, otherwise we continue and hit upon our return statement.

The code that we have just seen therefore realizes the function selection. If your contract contains more than one function, you will see more than one comparison, and the upshot is that we either jump into the function that corresponds to the signature hash or revert (unless we have a fallback function).

This also tells us that in our case, the execution of sayHello() starts at jump target 3. The code that we see here is also typical. We push two values on the stack – first a return offset and then a jump target. We then jump, execute some code and eventually execute another jump. This second jump will then take its target from the stack, so it returns to the first offset that we have pushed onto the stack. In our case, we jump to target 5, execute the code there, and then jump back to target 4. This approach – pushing return values onto the stack – mimics the way how local functions are executed in other programming languages like C. In our case, jump target 4 is simply executing the STOP opcode which completes the execution without a return value.

Finally, let us take a look at the code at jump target 5, which is therefore the body of sayHello(). Here, we first run MLOAD to get the value of the free memory pointer. We then put a full 32 byte word onto the stack, namely the hash of the string “SayHello()'”, i.e. the signature of the event that we emit. We then swap the first two elements on the stack, push zero and swap once more. Our stack now looks as follows.

| 0x80 | 0x0 |  hash(event signature) | return address  |

Now we execute LOG1. Again, the yellow paper is our friend and tells us that the first entry on the stack is the offset of the log data, the second entry is the length and the third entry is the first (and, in this case, the only) topic. So we log an event with no data and topic being the hash of the event signature, as expected. The log statement will consume the first three stack items, and when we now jump, we therefore end up at tag 4, where we execute the STOP opcode to complete the transaction.

Encoding and state variables

We have now completed the analysis of our sample contract. A natural next step is to add more functionality to the contract and see how this changes the output of the compile. As an example, let us add some state to our contract. In the body of the contract code, add the line

uint256 value;

and the method

function store(uint256 _value) public {
    value = _value;
}

Let us now run the compiler again, this time with a few more flags that request additional output (the reason for this will become clear in a minute).

$SOLC contracts/Hello.sol \
       --asm \
       --optimize \
       --storage-layout \
       --combined-json generated-sources-runtime

Here is a listing of the relevant code that is newly added to our contract by the changes we have made. Again, I have added some comments and labeled the jump destinations from A to E.

PUSH1 0x47     // address of label B 
PUSH1 0x42     // address of label A 
CALLDATASIZE 
PUSH1 0x4 
PUSH1 0x76     // address of label D 
JUMP 
// Label A - this is at offset 0x42 
JUMPDEST 
PUSH1 0x0 
SSTORE 
JUMP 
// Label B - this is at offset 0x47
JUMPDEST 
STOP 
// Label C - this is at offset 0x48
// I have removed the code in this  section
// which we have already looked at before
// it logs the event and then jumps to label B
// where we STOP
// Label D - this is at offset 0x76 
JUMPDEST 
PUSH1 0x0 
PUSH1 0x20 
DUP3 
DUP5 
SUB 
SLT         
ISZERO 
PUSH1 0x87     // address of label E
JUMPI          // conditional jump to label E
PUSH1 0x0 
DUP1 
REVERT 
// Label E - this is at offset 0x87 
JUMPDEST 
POP 
CALLDATALOAD 
SWAP2 
SWAP1 
POP 
JUMP

The first few lines are again easy to interpret – we prepare a jump, which is an internal function call, i.e. we place a return address and, in this case, arguments on the stack and then jump to label D. When we get there, our stack looks as follows (recall that CALLDATASIZE puts the size of the calldata, i.e. the length of the transaction input in bytes, onto the stack).

4 | len(tx.input) | label A | label B

At label D, we put a few additional items on the stack. If you go through the instructions, you will find that when we reach the SUB opcode, the stack looks as follows.

len(tx.input) | 4 | 32 | 0 | 4 | len(tx.input) | A | B

Now we execute the SUB opcode, which will pop the first two items off the stack and push their difference. Thus, after completing this opcode, our stack will be

len(tx.input) - 4 | 32 | 0 | 4 | len(tx.input) | A | B

The next instruction, SLT, is a signed version of the less-than instruction that we have already seen. Together with the subsequent ISZERO which is a simple logical inversion, its impact is to provide the following stack.

!(len(tx.input) - 4 < 32) | 0 | 4 | len(tx.input) | A | B

To get an idea what this is supposed to do, looking at the assembler output helps. In the comments that Solidity has generated, we find a hint – utility.yul. As the Solidity documentation explains, this means that the code we are looking at is part of a library of utility functions, written in the Yul language (an intermediate language that Solidity uses internally). However, these utility functions are not stored anywhere in a file with this name, but are actually generated on the fly by the compiler (in our case, this happens here). The additional flag generated-source-runtime that we have used when running Solidity instructs the compiler to print out a Yul representation of the utility functions. The Yul code, the name of the function and the source code of the Solidity compiler that I have linked above solve the puzzle – the code we are looking at is supposed to decode the transaction input and to extract the argument (which is called _value in the source code of our contract).

Now the Solidity ABI demands that the argument be stored in the transaction input as a 256-bit, i.e. 32 byte word, directly after the four bytes containing the function signature. What the code that we are analyzing is doing is to check that the total length of the transaction input is at least those four bytes plus the 32 bytes. If this is not the case, we continue and revert. If this is the case, i.e. if the validation is successful, we perform a conditional jump and end up at label E. When we get there, our stack is

0 | 4 | len(tx.input) | A | B

We now remove the first item on the stack, use CALLDATALOAD to load a full 32 byte word starting at byte 4 of the transaction input onto the stack (i.e. the 32 byte word that is supposed to contain our parameter), and use two swaps and a pop operation to produce the following stack.

A  | _value | B

The conditional jump will therefore take us to label A again, with the _value parameter at the top of the stack. Hee, we push zero onto the stack and perform an SSTORE. This will store _value at position zero of the storage and leave us with the address of label B on the stack. The following jump will therefore take us to the STOP opcode, and the transaction completes.

So, the content at offset zero of the storage seems to represent the stored value. Here, we could easily derive this from the code, but in general, this can be more difficult. To help us to map the state variables declared in the source code to storage locations, Solidity creates a storage map which we have included in our output using the –storage-layout switch. The storage layout is an array, where each entry represents one state variable. For each variable, there is a slot and an offset. As indicated in the documentation, the slot is the address in the storage area, but one slot can contain more than one item (if an item is smaller than 32 bytes), and in this case, the offset is the offset within the slot. For dynamic data types, the layout is much more complicated, for mappings, for instance, the actual slot is determined as a hash value of he key.

Metadata and hashes

If you have followed the analysis carefully, you might have noted that the last few opcodes do not seem to be executed at all. In fact, they do not even make sense, starting already with an invalid opcode 0xFE. Again, the assembler output helps to interpret this – it designates this part of the bytecode as “auxdata”, which does in fact not contain valid bytecode, but the IFPS hash of the contract metadata (more precisely a CBOR encoded structure which contains the IPFS hash as a key)

The contract metadata, which can be produced using the –metadata compiler switch, is a JSON structure that contains, among other things

the contract ABI
the Keccak hash of the source code
the IPFS hash of the source code
the exact compiler version
the compiler settings used to produce the bytecode

The idea behind this is that a developer can store the metadata and the contract source in IPFS. A user who finds the contract on the blockchain can then use the last few bytes – the IPFS hash of the metadata – to retrieve that document from the IPFS network. As the metadata document contains the IPFS hash of the source, a user could now retrieve the source as well. This mechanism therefore allows you to link the source code to the contract and to prove that the contract bytecode has been created using the source code and a given set of compiler settings. Within the Solidity source code, all this happens here.

We have seen that the metadata hash and the runtime bytecode are separated by the invalid opcode 0xFE. This byte appears at another location in the full bytecode – the end of the init bytecode. In both cases, the motivation is the same – we want to avoid that, due to an error, the execution can continue past these boundaries. So we now realize that the full bytecode contains of three sections, separated by the invalid opcode 0xFE.

This closes our post for today. Of course, you could now add additional features to our contract, maybe return values or mappings, and see how this affects the generated bytecode. In the next post, however, we will turn to another topic which is central to understanding smart contracts – how the Ethereum virtual machine actually operates.

A deep-dive into Solidity – contract creation and the init code

In some of the previous posts in this series, we have already touched upon contract creation and referred to the fact that during contract creation, an init bytecode is sent as part of a transaction which is supposed to return the actual bytecode of the smart contract. In this and the next post, we will look at this in a bit more detail and, along the way, learn how to decipher the EVM bytecode for a simpler contract.

Contract creation – an overview

Before diving into details, let us first make sure we understand the contract creation process in Solidity. A good starting point is section 7 of the Ethereum yellow paper.

A transaction will create a contract if the recipient address of the transaction is empty (i.e. technically the zero address). A creation operation can contain a value, which is then credited to the address of the newly created contract (even though in Solidity, this requires a payable constructor). Then, the initialisation bytecode, i.e. the content of the init field of the transaction, is executed, and the returned array of bytes is stored as the bytecode of the newly created contract. Thus there are in fact two different types of bytecode involved during the creation of a smart contract – the runtime bytecode which is the code executed when the contract is invoked after its initial creation, and the init bytecode which is responsible for preparing the contract and returning the runtime bytecode.

To understand what “returning the runtime bytecode” actually means, we need to consult the definition of the RETURN opcode in appendix H. Here, the return value function H_return is specified, which is referenced in section 9 and defines the output of a bytecode execution. It takes a moment to get familiar with the notation, but what the definition actually says is that the output is placed in the virtual machine memory, where the offset is determined by the top of the stack and the length is determined by the second element on the stack. Thus the init bytecode needs to

make any changes to the state of the contract address needed (maybe initialize some state variables)
place the runtime bytecode somewhere in memory
push the length of the runtime bytecode onto the stack
push the offset of the runtime bytecode (i.e. the address in memory where it starts) onto the stack
execute the RETURN statement

To make this a bit more tangible, let us again use Brownie to see how this works in practice. We will use a simple sample contract which does nothing except logging an event when its sayHello method is invoked. So make sure that you have a Brownie project directory containing this contract (if you have cloned my repository, I recommend to create a tmp subdirectory and link the contract there, as described here), and open the Brownie console. Then, we deploy a copy of the contract and inspect the transaction that Brownie has used to do this.

me = accounts[0]
hello = Hello.deploy({"from": me})
tx = web3.eth.get_transaction(hello.tx.txid)  
tx
hello.balance()

You should see that the value of the transaction is zero, the recipient is None and the input is an array of bytes, starting with 0x60806040. This is the init bytecode, which we will study in the remaining part of the post. You can also see that the initial balance of the contract is zero.

Reading EVM bytecode – the basics

Before we dive into the init bytecode, we first have to collect some basic facts about how the Ethereum virtual machine (EVM) works. Recall that the bytecode is simply an array of bytes, and each byte will be interpreted as an operation. More precisely, appendix H of the yellow paper contains a list of opcodes each of which represents a certain operation that the machine can perform, and during execution, the EVM basically goes through the bytecode, tries to interpret each byte as an opcode and executes the corresponding operation.

The EVM is what computer scientists call a stack machine, meaning that virtually all operations somehow manipulate the stack – they take arguments from the stack, perform an operation and put the resulting value onto the stack again. Note that most operations actually consume values from the stack, i.e. pop them. As an example, let us take the ADD operation, which has bytecode 0x1. This operation takes the first two values from the stack, adds them and places the result on the stack again. So if the stack held 3 and 5 before the operation was executed, it will hold 8 after the operation has completed.

Even though most operations take their input from the stack, there are a few notable exceptions. First, there are the PUSH operations, which are needed to prepare the stack in the first place and cannot take their arguments from the stack, as this would create an obvious chicken-and-egg challenge. Instead, the push operation takes its argument from the code, i.e. pushes the byte or the sequence of bytes immediately following the instruction. There is one push operation for each byte length from 1 to 32, so PUSH1 pushes the byte in the code immediately following the instruction, PUSH2 pushes the next two bytes and so forth. It is important to understand that even PUSH32 will only place one item on the stack, as each stack item is a 32 byte word, using big endian notation.

The init bytecode

Armed with this understanding, let us now start to analyze the init bytecode. We have seen that the init bytecode is stored in the transaction input, which we can, after deployment, also access as hello.tx.input. The first few bytes are (using Solidity 0.8.6, this might change in future versions)

0x6080604052

Let us try to understand this. First, we can look up the opcode 0x60 in the yellow paper and find that it is the opcode of PUSH1. Therefore, the next byte in the code is the argument to PUSH1. Then, we see the same opcode again, this time with argument 0x40. And finally, 0x52 is the opcode for MSTORE, which stores the second stack item in memory at the address given by the first stack item. Thus, in an opcode notation, this first piece of the bytecode would be

PUSH1 0x80
PUSH1 0x40
MSTORE

and would result in the value 0x80 being written to address 0x40 in memory. This looks a bit mysterious, but most if not all Solidity programs start with this sequence of bytes. The reason for this is the how Solidity organizes its memory internally. In fact, Solidity uses the memory area between address zero and address 0x7F for internal purposes, and stores data starting at address 0x80. So initially, free memory starts at 0x80. To keep track of which memory can still be used and which memory areas are already in use, Solidity uses the 32 bytes starting at memory address 0x40 to keep track of this free memory pointer. This is why a typical Solidity program will start by initializing this pointer to 0x80.

We could now continue to analyze the remaining bytecode in this way, manually looking up opcodes in the yellow paper, but this if of course not terribly efficient. Instead, let us ask the Solidity compiler to spit out the opcodes for us, instead of the plain bytecode. We do not even have to download and install Solidity, because we have already done this when installing the py-solcx module. So let us politely ask Python to spit out the location and version number of the solc binary and invoke it to compile our contract to opcode.

VERSION=$(python3 -c 'import solcx ; print(solcx.get_solc_version())')
DIR=$(python3 -c 'import solcx ; print(solcx.get_solcx_install_folder())')
SOLC="$DIR/solc-v$VERSION"
$SOLC contracts/Hello.sol --opcodes

As a result, you should see something like this (I have added linebreaks to make this more readable and only reproduced the first few opcodes).

====== contracts/Hello.sol:Hello =======
Opcodes:
PUSH1 0x80 
PUSH1 0x40 
MSTORE 
CALLVALUE 
DUP1 
ISZERO 
PUSH1 0xF 
JUMPI 
PUSH1 0x0 
DUP1 
REVERT 
JUMPDEST               <---- Marker A
POP 
PUSH1 0x99 
DUP1 
PUSH2 0x1E 
PUSH1 0x0 
CODECOPY 
PUSH1 0x0 
RETURN 
INVALID 
PUSH1 0x80             <--- Marker B
PUSH1 0x40 
MSTORE

This is much better (in fact, Solidity can actually produce a number of different output formats – as we go deeper into the actual runtime bytecode in the next post, we will find –asm useful as well). I have also added two markers manually to the output that we will need when discussing the code.

We have already analyzed the first three lines, so let us look at the next section of the code, starting at CALLVALUE. Again, we can consult the yellow paper to figure out what this instruction does – it gets the value of the transaction and stores it on the stack. We then duplicate this value on the stack, so that the stack now looks like this

| value | value |

and invoke the ISZERO operation. This operation takes the first stack item and replaces it by one if it is zero or by zero otherwise. Next, we push 0x1F, so our stack now looks like this

| 0x1F | value == 0 | value

The next instruction is JUMPI. This is a conditional jump which is only executed if the second stack item is non-zero, and in this case, we jump to the point in the bytecode designated by the first stack item. Thus, if the value of the transaction is zero, we jump to the offset 0x1F, otherwise we continue.

Let us suppose for a moment we include a non-zero value with our transaction. Then, we continue with the next statement after the JUMPI, push zero onto the stack, duplicate and REVERT. Consulting the yellow paper once more, we find that the two topmost items on the stack that are present when we do a revert are used to define the return value – the rule is the same as for RETURN, meaning that the first item on the stack is an offset, the second item is the length. Thus with two zeroes on the stack, we do not return anything. Summarizing, we revert the transaction if the contract creation transaction has a non-zero value, and Solidity generates this code because we have not declared a payable constructor.

Let us now see how the execution proceeds if the value is zero. To be able to do this, we have to figure out the instruction at offset 0x1F (15). So let us count – every instruction consumes one byte, and the additional arguments to PUSH1 also consume one byte each. Thus, we find that the execution continues at the JUMPDEST instruction that I have called marker A. The JUMPDEST opcode does not actually do anything, it is simply a marker byte that the EVM uses to make sure that a jump points to valid location. So we now enter the part of the code that reads like this.

JUMPDEST               <---- Marker A
POP 
PUSH1 0x99 
DUP1 
PUSH2 0x1E 
PUSH1 0x0 
CODECOPY 
PUSH1 0x0 
RETURN 
INVALID 
PUSH1 0x80             <--- Marker B

Note that at this point, we still have the transaction value on the stack, which we remove with the first POP statement. We then push 153, duplicate this, push 30 and zero, so the stack now looks like this

| 0 | 30 | 153 | 153 |

The next instruction is CODECOPY. This copies code of the currently running contract to memory. It consumes three parameters from the stack. The element at the top of the stack defines the target address (i.e. offset) in memory. The second parameter defines the source offset in the code, and the third parameter defines the number of bytes to copy.

Counting once more, we see that the code we copy is 153 bytes long and starts at the point that I have called marker B. The code starting there will therefore be copied to address zero in memory, and after that has been done, our stack contains 153. We then push 0, so that the stack now looks like

| 0 | 153 |

Finally, we RETURN. Now recalling how the return value of a contract execution is defined, we see that the return value of executing all of this is the bytearray of length 153 stored at address zero in memory, which, as we have just seen, are the 153 bytes of code starting at marker B. So the upshot is that this is the runtime bytecode, and the code we have just analyzed does nothing but (after making sure that the transaction value is zero) copying this bytecode into memory and returning it (by the way – if you want to see where exactly in the Solidity source code this happens, this link might be a good entry point for your research).

That’s it – we have successfully deciphered the initialization procedure of a very simple smart contract. Note that if the contract had a constructor, it would be executed first, before copying the runtime bytecode and returning (you might want to add a simple constructor and repeat the analysis). In the next post, we will learn a few additional tricks to obtain useful representations of the runtime bytecode and ten dive into how the runtime bytecode works. See you!

Smart contract security – some known issues

Smart contracts are essentially immutable programs that are designed to handle valuable assets on the blockchain and are mostly written in a programming language that has been around for a bit more than five years and is still rapidly evolving. That sounds a bit dangerous, and in fact the short history of the Ethereum blockchain is full of notable examples that demonstrate how smart contracts can be exploited to steal money. In this post, we will look at a few known security issues that you should try to avoid.

Background – receiving payments with Solidity

Not quite surprisingly, most exploits that we have seen that target smart contracts are somehow related to those parts of a contract that makes payments, so let us first try to make sure that we understand how payments are handled in Solidity.

First, recall that as any other address, a contract address has a balance and can therefore receive and transfer Ether. On the level of the Ethereum virtual machine (EVM), this is rather straightforward. Whenever a smart contract is called, be it from an EOA or another contract, the message call specifies a value. This is the amount of Ether (in Wei) that should be transferred as part of the contract execution. Very early in the processing, before any contract code is actually executed, this amount is transferred from the balance of the caller to the balance of the callee (unless, of course, the balance of the caller is not sufficient).

In Solidity, the situation is a bit more complicated. To see why, let us first imagine that you write and deploy a smart contract and then someones transfers Ether to the contract address. That amount is then added to the balance of the contract, and to access it, you would either need to submit a transaction signed with the private key of the contract address, or the contract itself needs to implement a function that can transfer the Ether to some other address, preferrably an EOA address. Now, a smart contract address has no associated private key – it is the result of a calculation at the time the contract is created, not a key generation process. So the only way to use Ether that is held by a contract is to invoke a function of the contract that transfers it out of the contract again. Thus if you accidentally transfer Ether to a smart contract which does not have such a function, maybe because it was never designed to receive Ether, the Ether is lost forever.

To avoid this, the designers of Solidity have decided that contract functions that can receive Ether need to be clearly marked as being able to handle Ether by declaring them as payable. In fact, if a contract method is not marked as being payable, the compiler will generate code that, if that method is called, checks if the message call specifies a non-zero value, i.e. if Ether should be transferred as part of the call. If yes, this code will revert the execution so that the transfer will fail.

Apart from an ordinary function call, there are special cases that we need to handle. First, it might of course happen that a smart contract is invoked without specifying a method at all. This happens if someone simply sends Ether to a smart contract (maybe without even knowing that the target address is a smart contract) and leaves the data field in the transaction (which, as we know, contains a hash of the target function to be called) empty. To handle this case, Solidity defines a special function receive. If this function is present in a contract, and the contract is called without specifying a target function, this method will be executed.

A similar mechanism exists to cover the case that a contract is invoked with a target function that does not exist or is invoked with no target function and no receive function exists. This special function is called the fallback function (in previous versions of Solidity, fallback and receive functions were identical). If none of these fallback functions is present, the execution will fail.

Send and transfer

Having discussed how a smart contract can receive Ether, let us now discuss how a smart contract can actually send Ether. Solidity offers different ways to do this. First, there is the send method. This is a method of an address object in Solidity and can be used to transfer a certain amount of Ether from the contract address to an arbitrary address. So you could do something like

address payable receiver =  payable(address(0xFC2a2b9A68514E3315f0Bd2a29e900DC1a815a1D));        
// Be careful, do NOT do this!
receiver.send(100);

to send 100 Wei to the target address receiver (note that in recent versions of Solidity, an address to which you want to send Ether needs to be marked as payable). However, this code already contains a major issue – it does not check the return value of send!

In fact, send does return true if the transfer was successful and false if the transfer failed (for instance because the current balance is not sufficient, or because the target is a smart contract without a receive or fallback function, or if the target is a contract with a receive function, but this function runs out of gas). If, as in this example, you do not check the return code, a failed transfer will go unnoticed. As an illustration, let us consider a famous example where exactly this happened – the King of the Ether contract . The idea of this contract was that by paying a certain amount of Ether, you could claim a virtual throne and be appointed “King of the Ether”. If someone else now pays an amount which is the amount which you have paid times a factor, this person would become the new King, and you would receive the amount that you invested minus a fee. In the source code of v0.4 of the contract, the broken section looks as follows (I have added a few comments not present in the original source code to make it easier to read the snippet without having the full context)

// we get to this point in the code if someone has paid enough to
// become the new king
// valuePaid is the Ether paid by the current king
// wizardCommission is a fee that remains in the account
// of the contract and can be claimed by the contract owner (wizard) 
uint compensation = valuePaid - wizardCommission;

// In its initial state, the current monarch is the wizard
// so we check for this
if (currentMonarch.etherAddress != wizardAddress) {
  // here we send the the Ether minus the fees back 
  // to the current king
  currentMonarch.etherAddress.send(compensation);
} else {
  // When the throne is vacant, the fee accumulates for the wizard.
}

Note how send is used without checking the return code. What actually happened is that some people who held the throne did apparently use what is called a contract based wallet, i.e. a wallet that manages your Ether in a smart contract. Thus, the address of the current king (currentMonarch) was actually a smart contract. If a smart contract receives Ether, then, as we have seen above, it will execute a function. Now send only makes a very small amount of gas (2300 to be precise) available to the called contract (this is called the gas stipend, and we will dive into this and how a call actually works under the hood in a later post), which was not sufficient to run the code. So the called contract failed, but, as the return value was not checked, the calling contract continued, effectively stealing the compensation instead of paying it out.

The withdrawal pattern

It is interesting to discuss how this can be fixed. The obvious idea might be to check the return value and revert the transaction if it is false. Alternatively, one can use the second method that Solidity offers to transfer Ether – the transfer method, which will revert if the transfer fails. This, however, results in a new problem, as it allows for a denial-of-service attack.

To see this, suppose that a contract successfully claims the throne, and then someone else tries to become the new king, resulting in the execution of the code above. Suppose we use transfer instead of send. Now the contract which is the current king might be a malicious contract with a receive function that always reverts, or no receive function at all. Then, any attempt to become the new king will be reverted, and the contract is stuck forever.

This is a very general problem that you will face whenever a method of a smart contract calls another contract – you can not rely on the other contract to cooperate and it is dangerous to assume that the call will be successful. Therefore, the Solidity documentation recommends a pattern known as the withdrawal pattern. In our specific case, this would work as follows. Instead of immediately paying out the compensation, you would store the claim in the contract state and allow the previous king to call a withdraw method that does the transfer, maybe like this.

// this replaces currentKing.send(compensation)
claims[currentKing]+=compensation
// code goes on...


// a new function that allows the current king to collect the compensation
function withdraw() public {
  uint256 claim = claims[msg.sender];
  if (claim > 0) {
    claims[msg.sender] = 0;
    payable(msg.sender).transfer(claim);
  }
  else {
    revert("Unjustified claim");
  }
}

Why would this help? Suppose an attacker implements a contract that reverts if Ether is sent to it. If this contract is the current king and someone else claims the throne, enthroning the new king will work, because the transfer is contained in the separate function withdraw. If now the attacker invokes this function, it will still revert, but this will not impact the functionality of the contract of other users, so not denial of service (impacting anyone except the attacker) will result.

Reentrancy attacks and TheDAO

Let us suppose for a moment that in the code snippet above, we had chosen a slightly different order of the statements, and, in addition, had decided to use a low-level call to transfer the money, like this

(bool success, bytes memory data) = msg.sender.call{value: claim}("");
require(success, "Failed to send Ether");
claims[msg.sender] = 0;

Here, we use the call method of an address, which has the advantage over transfer that it does not only make the minimum of 2300 units of gas available to the caller, but the full gas remaining at this point. This makes the contract less vulnerable to errors resulting out of non-trivial receive functions, which is the reason why it is sometimes recommended to use this approach instead of transfer.

This would in fact make our contract again vulnerable, this time to a class of attacks known as re-entrancy attack. To exploit this vulnerability, an attacker would have to prepare a malicious contract that enthrones itself and whose receive function calls the withdraw function again (but with a depth of at most one). If no someone else has claimed the throne and the malicious contract calls withdraw, the following things would happen.

The malicious contract calls withdraw for the first time
withdraw initiates the transfer of the current claim to the malicious contract
the receive function of the malicious contract is invoked
the receive function calls withdraw once more
at this point in time, the variable claims[msg.sender] still has its original, non-zero value
so the same transfer is made again
both transfers succeed, and the claim is overwritten by zero twice

As a result, the claim is transferred twice to the malicious contract (assuming, of course, that the King of the Ether contract has a sufficient balance). Of course instead of invoking the function twice, you can let the receive function call back into the contract several times, thus multiplying the amount transferred by the number of calls, limited only by the stack size and the available gas. This sort of vulnerability was the root cause for the famous theDAO hack, which eventually lead to a fork of the Ethereum block chain.

Note that in this case, using transfer instead of call would actually protect against this sort of attack, at the second call into the King of the Ether contract would require more gas than transfer makes available.

Create2 and the illusion of immutable contracts

Smart contracts are immutable – are they? Well, actually no – there are several ways to change the behaviour of a smart contract after it has been deployed. First, you could of course build a switch into your contract that only the owner can control. A bit more advanced, a contract can act as a proxy, delegating calls to another contract, and the contract owner could change the address of the target contract while keeping the address of the proxy the same.

An additional option has been created with EIP-1014. This proposal, which went live with the Constantinople hard fork in 2019, introduced a new opcode CREATE2 which allows for the creation of a contract with a predictable address. Recall that when a contract is created, the contract address is determined from the address of the owner and the current nonce. This makes it difficult to predict the address of the contract, unless you use an account for contract creation whose nonce is kept stable. When using CREATE2 instead, the contract address is taken to be the hash value of a combination of the sender address, a salt and the init code of the contract to be created.

The problem with this is, however, that the init code does not fully determine the runtime bytecode. Recall that the init code is bytecode that is executed at deployment time, and whose return value will be stored and used as the actual contract code executed at runtime (we will see this in action in the next post). The init code could, for instance, retrieve the actual runtime bytecode by calling into another contract. If the state of this contract is changed to return a different bytecode, the init code will still be the same. Thus, by using CREATE2 repeatedly with the same init code and salt, different versions of a contract could be stored at the same address.

To avoid this, the creators of EIP-1014 introduced a safeguard – if the target address already contains code or has a non-zero nonce, the invocation will fail. However, there is a loophole, which works as follows.

Prepare an init bytecode that get the actual runtime bytecode from a different contract, as outlined above
Use CREATE2 to deploy this runtime bytecode to a specific address
In the runtime bytecode, include a method that executes the SELFDESTRUCT opcode (protected by the condition that it only executes if the sender address is an address that you control). This is an opcode that will effectively wipe out the code of a contract and set the nonce of the contract address back to zero
Motivate people to deposit something of value in your contract, maybe Ether or token
At any point in time, you could now use this method to remove the existing contract. At this point, the nonce and code are both zero. You could now invoke CREATE2 once more to deploy a new contract to the same address with a different runtime bytecode, which maybe steals whatever assets have been deposited in the old contract

In this way, the functionality of a smart contract can be changed without anyone noticing it. Of course, this only works under specific conditions, the most important one being that the contract needs to contain the SELFDESTRUCT opcode. The only real protection is to have a look at the contract source code (or event at the runtime bytecode) before trusting it and become alerted if the contract has a SELFDESTRUCT in it (or uses an instruction like DELEGATECALL to invoke code that contains a SELFDESTRUCT). It seems that Etherscan is now able to track contract recreation using CREATE2, here is an example from the Ropsten test network, note the “Reinit” flag being displayed on the contract tab, and here is an example from mainnet.

This concludes our post for today. There are many more security considerations and pitfalls that you should be aware of whenever you develop a smart contract that is going to be used on a real network with real money being involved. In the next section, I have listed a few references that you might want to consult to learn more about smart contract security. I hope you found this interesting and see you again in the next post, in which we will take a closer look at how Solidity translates your source code into EVM bytecode.

References

Here is a list of references that I found useful while collecting the material for this post.

OpenZeppelin has a rather comprehensive list of post-mortems on its web site
Consensys maintains a collection of best practises for smart contracts that explain common vulnerabilities and how to protect against them
The Solidity documentation contains a section on security considerations
This paper contains a classification of common vulnerabilities and discusses which of them can be avoided by using Vyper instead of Solidity as a smart contract language
A similar list can be found in this conference paper
The implications of the CREATE2 opcode have been discussed in detail here
Finally, the documentation on ethereum.org contains a section on security considerations as well