In my last post, I have taken you through the foundations of the Ethereum blockchain, namely state, transactions and blocks and how they are related. We have seen that nodes update their local state by executing transactions contained in blocks. Today, we will take a closer look at how the transactions are actually executed and discuss the killer feature of the Ethereum blockchain – the Ethereum virtual machine and smart contracts.
The Ethereum blockchain can be described as a state machine. At a given point in time, a node holds a certain state. Then a block is processed, i.e. inserted into the copy of the blockchain that the node holds, and the transactions contained in the block (and actually, as we will see, the block itself) make changes to the state. After the block has been processed, the node holds a new state. Formally (and this is what the yellow paper does) the new state is therefore a function of the previous state and the block. As all nodes agree on the protocol how the state is updated, and all nodes agree on the blocks and transactions, all nodes will eventually hold the same state.
To understand the changes to the state that are made during the processing of a block, there are two sources that we can consult. First, there is the yellow paper itself, which is a bit hard to read as one needs to get used to the notation, but is the authoritative source which serves as the official specification. Second, we can look at the code of known implementations, like that of the geth client. It is actually quite instructive to compare the yellow paper and the structure of the code and compare them, so let us do this.
We start our journey with the code. One of the central objects in the code base is the BlockChain class, defined in core/blockchain.go. This class represents the copy of the blockchain held by the node, along with the up-to-date state. It has a method InsertChain that first validates block headers and then uses a StateProcessor to perform the updates to the state caused by a block or collection of blocks. Its Process method is called with a block and a current state, and applies the changes represented by the transactions in this block to the state.
Time to switch to the yellow paper for a moment. In section two, the paper defines three state transition methods.
- First, there is a state transition method denoted by a capital pi, that accepts a state and a block and returns the resulting new state – this state transition function could be called the state transition function on the block level
- Then, it does the same on the transaction level – this state transition function, denoted by a capital upsilon, therefore accepts a state and a transaction and returns the updated state
- Finally, it defines a finalization function denoted by a capital Omega and declares that the block level state transition function is obtained by first applying all state transitions and then applying the finalization function to the resulting state
This gives us a first idea how the state update works, and is actually reflected nicely in the structure of the source code.
- Given a new block, loop through all transactions in the block
- For each transaction, invoke the function applyTransaction to process this transaction – so this function corresponds to the state transition function on block level
- Once this has been done, finalize the block, corresponding to the block finalization function, this involves mainly the calculation and transfer of the block reward
- all this happens in the method Process which thus corresponds to the block level transition function
Looking at this function, there is an interesting point – right at the start of the function, a modification to the state is hardcoded if the block being processed is equal to the block number of the famous DAO fork. So if you ever wanted to know how this fork really works, here is the answer – all clients that accepted the fork simply agree on a modification of the state transition function which transfers all funds in the DAO contract to a new contract from which the legitimate owner could withdraw it. In other words, the fork has modified the state transition function (this is sometimes called an irregular state change).
Let us now dive a little deeper into what processing a transaction actually means. Again, we can look at the source code (with the main part of the processing being done here) or at the yellow paper, specifically at sections six to nine. To simplify things a bit, let us assume for the time being that our transaction is an ordinary transaction, i.e. that we simply transfer Ether to some other account and no smart contract is involved (i.e. that the recipient is an EOA). In this case, the processing is actually not overly complicated.
First, a few checks are executed so see whether the transaction is valid – this includes checking that the nonce of the transaction is equal to the current nonce of the sender (before applying the state changes). As part of these checks, the upfront cost is deducted from the balance of the sender, which is defined as the gas limit in the transaction times the gas price (if the gas limit is not exhausted, the remaining gas will be refunded at the end of the transaction). Then the nonce of the sender is incremented by one, and the actual transaction processing starts here by invoking the Ethereum virtual machine. We will discuss what it does in the next section, but for the moment, be assured that if the recipient account contains no code, it simply transfers the Ether represented by the value field of the transaction from the sender to the recipient.
Once this is done, the remaining gas is refunded, and the gas used is credited to the miner (more precisely the beneficiary or coinbase of the block being processed). With this, the processing is complete and the next transaction starts.
The Ethereum virtual machine
We have now achieved a good overview of what happens during transaction processing, but we have glossed over an important point – the invocation of the Ethereum virtual machine (EVM) which happens here. So what is the EVM?
Technically, the EVM is a virtual machine, very similar to the Java virtual machine (JVM) that knows a certain set of instructions and a certain state – a stack, a memory and a program counter. When this machine executes a series of instructions known as bytecode, every instruction manipulates the state of the machine. The set of instructions (opcodes) is rich enough to make the EVM Turing complete and described in appendix H of the yellow paper. Just to give you an idea, here are some examples for the available instructions.
- arithmetic instructions like ADD, SUB or MULT
- comparisons and boolean logic (AND, OR, LT, GT, ..)
- operations to access the execution context, like the balance of an account (BALANCE), the sender of the transaction (ORIGIN), or the GASPRICE
- Flow control operation, like CALL (to call another smart contract) or REVERT (abort execution of a contract)
- operations to access the storage of the account under which the contract is executing, like SSTORE and SLOAD
Each of these operations has an associated gas value that is consumed when the EVM executes this operation.
Now, whenever a transaction is processed, the node will check whether the recipient address has a non-zero code. If no, the processing is as described above. If yes, the address which is the target of the transaction is interpreted as a smart contract, and its associated code will be interpreted as bytecode and will be executed by the EVM (note that this happens after the contracts balance has been increased by the value of the transaction, so you can already spend this amount in the contract). Thus a smart contract is stored in the blockchain (as part of the state of its address) and its execution is triggered by making a transaction to the contract address, which might or might not involve a transfer or Ether.
During this execution, the code can make changes to the state, as it can write to the storage, and it can use the CALL operation to initiate a message call to another account which works very similar to a transaction and also has a parameter value to transfer Ether. All these changes will be incorporated into the final state, and thus the EVM execution becomes part of the state transition function.
Physically, the EVM is running on each node, but as the definition of the EVM is part of the consensus mechanism of the blockchain and the execution is fully deterministic, all nodes will arrive at the same updated state. We can therefore think of the EVM as a distributed virtual machine running synchronously on each node of the blockchain and updated “the” shared state of the blockchain.
As the EVM is Turing complete, this is a very powerful mechanism. Theoretically, a smart contract can perform any operation on the state that you can imagine. Of course, complex smart contracts consume a lot of gas and therefore their execution drives up transaction fees. This is actually a built-in security mechanism to avoid that someone deploys a smart contract that never completes (for instance by implementing an infinite loop) and blocks all nodes. In fact, as every instruction consumes gas, the gas limit is reached at some point (or the balance of the senders account is exceeded), and the execution stops as the transaction is running out of gas. Even worse (for the attacker), the consumed gas is lost, so there is actually a strong incentive to keep the complexity of smart contracts low.
We now understand what a smart contract really is – it is a sequence of instructions (bytecode) stored in an account and executed by each node whenever a transaction is directed to this account. As a smart contract can again invoke other smart contracts, you should think of the transaction execution as a chain – there is an initial transaction, which is always coming from an EOA and signed using a private key, this transaction can invoke a smart contract which in turn can invoke another smart contract and so forth. The terminology is a bit confusing at this point because different sources use slightly different definitions of what a transaction and a message call is, but I tend to think of each step in the chain as a message call and a transaction as the first step, which is distinguished from the other steps by being created and signed by an EOA, i.e. typically a human or a programm outside of the chain operated by a human.
Let us summarize what we have learned today.
- When a node processes a block, it updates the copy of the blockchain state that it holds. The consensus mechanism makes sure that the updates done by all nodes agree
- As part of these updates, the node processes all transactions included in the block
- When a transaction has a non-zero value, this includes the transfer of the corresponding amount of Ether from the sender to the recipient
- If the recipient account of a transaction has a non-zero code, i.e. is a smart contract, then the EVM built into the node will interpret this smart contract as bytecode that will be executed
- All changes to the state made by the contract are incorporated into the state update and therefore become part of “the” global blockchain state
- Thus we can think of a smart contract as a sequence of instructions that are stored in the blockchain and executed by the blockchain network when being triggered by a transaction
How are smart contracts created? Technically, everybody can assemble a smart contract and deploy it into the blockchain. To do this, you will have to submit a transaction containing your code (or, more precisely, code that when being executed returns your code) which has the zero address as recipient address. If during transaction execution, the node hits upon such a transaction, it will determine a contract address – this is the address to which we have to send a transaction to trigger the smart contract – from the address of the sender and the nonce of the sender, and store the contract in the code field of the state of this address.
Now to develop a smart contract, you will typically not write bytecode yourself, similar to JVM bytecode that is typically created by a Java compiler. Several high-level programming languages have been proposed over time to ease the creation of smart contracts, the most popular one being Solidity. In addition, a huge number of development tools, frameworks and platforms have been created that make the creation of a smart contract very easy. In the next post, we will go through some of these tools which will put us in a position to eventually write, compile, deploy and run our first smart contract.