Superconducting qubits – on islands, charge qubits and the transmon

In my previous post on superconducting qubits, we have seen how a flux qubit represents a qubits state as a superposition of currents in a superconducting loop. Even though flux qubits have been implemented and used successfully, most research groups today focus on different types of qubits using a charge qubit as an archetype.

Charge qubits – the basics

The basic idea of a superconducting charge qubit is to create a small superconducting area called the island which is connected to a circuit in such a way that we can control the number of charge carries that are located on the island. A typical way to do this is shown in the diagram below.

In this diagram, we see, in the upper left, a Josephson junction, indicated by the small cross. Recall that a Josephson junction consists of two superconducting electrodes separated by a thin insulator. Thus a Josephson junction has a capacity, which is indicated by the capacitor C in the diagram.

On the right of the Josephson junction is a second capacitor. Now charge carriers, i.e. Cooper pairs, can tunnel through the Josephson junction and reach the area between the second capacitor and the junction – this is our island. Conversely, Cooper pairs can cross the junction to leave the island again. The flow of Cooper pairs into the island and away from the island can be controlled by applying an external voltage Vg. Effectively, a certain number of Cooper pairs will be trapped on the island – this is why this circuit is sometimes called a Cooper pair box – but this number will be subject to quantum fluctuations due to tunneling through the junction. Roughly speaking, these fluctuations cause an oscillation which will give us energy levels, and we can use two of these energy levels to represent our qubit.

Let us try to understand how these energy levels look like. Again, I will try to keep things short and refer to my more detailed notes on GitHub for more details. The Hamiltonian for our system looks as follows.

$H = E_C[N - N_g] - E_0 \cos \delta$

Here, Eg and E0 are energies that are determined by the geometry of the junction and the value of the capacities in the circuit, N is the number of Cooper pairs on the island, Ng is a number depending on the external voltage and $\delta$ is (proportional to) the flux through the junction. Thus N and $\delta$ are our dynamic variables, representing the number of Cooper pairs on the island and its change over time, whereas the other quantities are parameters determined by the circuit and the external voltage. As Ng depends on the external voltage and can therefore be changed easily, this is the parameter we will be using to tweak our qubit.

Using a computer algebra program (or this Python notebook), it is not difficult to obtain a numerical approximation of the eigenvalues of this operator (we can, for instance, represent the Hamiltonian as a matrix subject to an eigenbasis for N and calculate its eigenvalues after cutting off at a finite dimension). The diagram below shows the results for E0 = 0.1 Ec.

We see that if Ng is an integer, there is a degeneracy between the first and second excited state. If, however, Ng is a half-integer, then this degeneracy is removed, and the first two energy levels are fairly well separated from the rest of the spectrum.

With this relation of the two energies EC and E0, the probability for a Cooper pair to tunnel through the Josephson junction is comparatively small. Thus, the fluctuations in the quantum number N are small, and the eigenstates of N are almost stationary and therefore almost energy eigenstates. As the energy levels are well separated, we can use the first two energy levels as a qubit. As the eigenstates of the operator representing the charge on the island are almost stationary states, this regime is called the charge regime and the resulting qubit is called the charge qubit.

The transmon

This is all nice, but in practice, there is still a problem. To understand this, let us take a look at the energy level diagram above again. The energy levels are not flat, meaning that a change in the value of Ng is changing the energy levels and therefore the stationary states. Unfortunately, the value of N g does not only depend on the external voltage Vg (which we can control), but also on charge noise, i.e. unwanted charge fluctuations that are hard to suppress.

Therefore, the charge qubit is quite sensitive to charge noise. The point Ng = 0.5, called the sweet spot, is typically chosen because at this point, at least the first derivative of the energy as a function of the charge vanishes, so that the qubit is only affected by charge noise to the second order. However, this dependency remains and limits the coherence time of the charge qubit.

One way to reduce the sensitivity to charge noise is to increase the ratio between E0 and EC . To understand what happens if we do this, take a look at the following diagram which displays the first few energy levels when this ratio is 5.0

We see that, compared to our first energy level diagram, the sensitivity to charge noise is reduced, as the energy of the first two energy levels is almost flat, with a minimal dependency on Ng . However, this comes at the cost of a more equidistant spacing of the energy levels, making the isolation between the first two energy levels and the rest of the spectrum hard, and the question arises whether we gain an advantage at the cost of another one.

Fortunately, it turns out that the sensitivity to charge noise decreases exponentially fast, but the anharmonicity of the energy levels decreases much slower. Thus there is a region for the ratio E0 / Ec in which this sensitivity is already comparatively low, but the energy levels are still sufficiently anharmonic to obtain a reasonable two-level system. A charge qubit operated in this regime is called a transmon qubit.

Technically, a larger value of E0 compared to EC is achieved by adding an additional capacitor parallel to the Josephson junction. If we use a high value for the additional capacity, we can make EC arbitrarily small and can achieve an arbitrary high ratio of E0 compared to EC.

To develop a physical intuition of why this happens, recall that the energy E0 of the Josephson junction measures the tunneling probability. If E0 is large compared to EC , it is comparatively easy for a Cooper pair to tunnel through the junction. Therefore the phase difference $\delta$ across the junction will only fluctuate slightly around a stationary value, i.e. the wave function will be localized sharply in the $\delta$-space. Consequently, the charge N will no longer be a good quantum number and the charge eigenstates will no longer be approximate energy eigenstates. Instead, we will see significant quantum fluctuations in the charge, which makes the system more robust to external charge noise. In this configuration, you should therefore think of the qubit state not as a fixed number of Cooper pairs on the island, but more as a constant tunneling current flowing through the junction.

To control and read out a transmon qubit, it is common to use a parallel LC circuit which is coupled with the transmon via an additional capacitor. Using microwave pulses to create currents in that LC circuit, we can manipulate and measure the state of the qubit and couple different qubits. Physically, the LC circuit is realized as a transmission line resonator, in which – similar to an organ pipe – waves are reflected at both ends and create standing wave patterns (that transmission lines are used is the reason for the name transmon qubit).

At the time of writing, most major players (Google, IBM, Rigetti) are experimenting with transmon based qubit designs, as it appears that this type of qubit is most likely to be realizable at scale. In fact, Transmon qubits are the basic building blocks of Google’s Bristlecone architecture as well as for IBMs Q experience and Rigettis QPU.

To learn more, I recommend this and this review paper, both of which are freely available on the arXiv.

Superconducting qubits – the flux qubit

In the last post, we have discussed the basic idea of superconducting qubits – implement circuits in which a supercurrent flows that can be described by a quantum mechanical wave function, and use two energy levels of the resulting quantum system as a qubit. Today, we will look in some more detail into one possible realization of this idea – flux qubits.

In its simplest form, a flux qubit is a superconducting loop threaded by an external magnetic field and interrupted by a Josephson junction. This is visualized on the left hand side of the diagram below, while the right hand side shows an equivalent circuit, formed by an inductance L, the capacity CJ of the junction and the pure junction.

It is not too difficult to describe this system in the language of quantum mechanics. Essentially, its state at a given point in time can be described by two quantities – the charge stored in the capacitor formed by the two leads of the Josephson junction, and the magnetic flow (or flux) through the loop. If you carefully go through all the details and write down the resulting equation for the Hamiltonian, you will find that the classical equivalent of this system is a particle moving in a potential which, for an appropriate choice of the parameters of the circuit, looks like the one displayed below.

This potential has a form which physicists call a double-well potential. Let us discuss the behavior of a particle moving in such a potential qualitatively. Classically, the particle would eventually settle down at one the minima. As long as its energy is below the height of the potential separating these two minima, it would remain in that state. In the quantum world, however, we would expect tunneling to occur, i.e. we would expect that our system has two basic states, one corresponding to the particle being close the left minimum and one corresponding to the particle being right to the minimum, and that we see a certain non-zero probability for the particle to cross the potential wall and to flip from one state into the other state. This two-state system already looks like a good candidate for a qubit.

Being a one-dimensional system, the eigenstate wave functions can be approximated numerically using standard methods, for example the “particle-in-a-box” approach. Again, I refer to my detailed notes for the actual calculation. The result is displayed below.

The diagram shows the ground state (blue curve on the left hand side) and the first excited state (blue curve on the right hand side). In both diagrams, I have added the classical potential (orange line) for the purpose of comparison. So we actually obtain the picture that we expect. Up to normalization, the ground state – the eigenstate on the left – is a superposition of two states

$|l \rangle - | r \rangle$

where $|l \rangle$ is a state localized around the left minimum of the potential and $|r \rangle$ is a state centered around the right minimum, whereas the first excited state is a linear combination proportional to

$|l \rangle + | r \rangle$

In general, there will be a small energy difference between these two states, which leads to a non-zero probability for tunneling between them. Intuitively, the state $|l \rangle$ corresponds to a supercurrent that flows through the loop in one direction and the state $|r \rangle$ is a state in which the supercurrent flows in the opposite direction. Our ground state – which would be the state $|0 \rangle$ in an interpretation as a qubit – and our first excited state $|1 \rangle$ are superpositions of these two states.

A nice property of the flux qubit is that the energy gap between the first excited state and the second excited state is much higher than the energy gap between the ground state and the first excited state. In the example used as basis for the numerical simulations described here, the second gap is more than one order of magnitude higher than the first gap. This implies that the two level system spanned by $|0 \rangle$ and $|1 \rangle$ is a fairly well isolated system and thus serves as a very good approximation to a qubit. The Hamiltonian can be manipulated by changing the flux through the loop by applying an external magnetic field or an external microwave pulse can be used to stimulate a transition from the ground state to the first excited state. In this way, the qubit can be read-out and controlled.

So theoretically, this system is a good candidate for the implementation of a qubit. In fact, this has been used in practice – the D-Wave quantum annealer is based on interconnected flux qubits. However, it seems that the flux qubit has come a bit out of fashion, and research has focussed on a new generation of superconducting qubits like the transmon qubit that work slightly differently. We will study this type of qubit in the next post in this series.

NMR based quantum computing: gates and state preparation

In my last post on NMR based quantum computing, we have seen how an individual qubit can be implemented based on NMR technology. However, just having a single qubit is of course not really helpful – what we are still missing is the ability to initialize several qubits and to realize interacting quantum gates. These are the topics that we will discuss today.

The physics and mathematics behind these topics is a bit more involved, and I will try to keep things simple. In case you would like to dive more deeply into some of the details, you might want to take a look at my notes on GitHub as well as the references cited there.

Pseudo-pure states

In quantum computing, we are typically manipulating individual qubits which are initially in a pure state. In an NMR probe, however, we usually have a state which is very far from being a pure state. Instead, we typically deal with highly mixed states, and it appears to be impossible to prepare all spins in an NMR probe in the same, well defined pure state. So we first need to understand how the usual formalism of quantum computing – expressed in terms of qubits being in pure states which are subject to unitary operations – relates to the description of an NMR experiment in terms of density matrices and mixed states.

The formalism that we will now describe is known as the formalism of pseudo-pure states. These are states that are described by a density matrix of the form

$\rho = \frac{1}{2^N}(1-\epsilon) + \epsilon |\psi \rangle \langle \psi |$

with a pure state $|\psi \rangle$. Here, N is the number of spins, and the factor 1/2N has been inserted to normalize the state. This density matrix describes an ensemble for which almost all molecules are in purely statistically distributed states, but a small fraction – measured by $\epsilon$ – of them are in a pure state. The second term in this expression is often called the deviation and denoted by $\rho_\Delta$.

Why are these states useful? To see this, let us calculate the time evolution of this state under a unitary matrix U. In the density matrix formalism, the time evolution is given by conjugation, and a short calculation shows that

$U(t) \rho U(t)^{-1} = \frac{1}{2^N} (1-\epsilon) + \epsilon |U(t) \psi \rangle \langle U(t) \psi |$

Thus the result is again a pseudo-pure state. In fact, it is the pseudo-pure state that corresponds to the pure state $U|\psi \rangle$. Similarly, one can show that if A is an observable, described by a hermitian matrix (which, for technical reasons, we assume to be traceless), then the expectation value of A evaluated on $|\psi \rangle$ – which is the result a measurement would yield in the pure state formalism – is, up to a constant, the same as the result of a measurement of A on an ensemble prepared in the pure state corresponding to $|\psi \rangle$!

Using these relations, we can now translate the typical process of a quantum computation as expressed in the standard, pure state formalism, into NMR terminology as follows.

• The process of initializing a quantum computer into an initial state $|\psi \rangle$ corresponds to putting the NMR probe into the corresponding pseudo-pure state with $\rho_{\Delta} \sim |\psi \rangle \langle \psi|$
• If the quantum algorithm is described as a unitary operator U (typically presented as a sequence of gates Ui), we apply the same unitary operator to the density matrix, i.e. we realize the gates as an NMR experiment
• We then measure the macroscopic quantity A which will give us – up to a factor – the result of the calculation

This is nice, but how do we prepare these states? Different researchers have come up with different techniques to do this. One of the ideas that is commonly applied is averaging. This is based on the observation that the state at thermal equilibrium is a pseudo-pure state up to an error term. This error term can be removed by averaging over many instances of the same experiment (while somehow shuffling the initial states around using a clever manipulation). So we first let the probe settle down, i.e. prepare it in a thermal state (which can even be at room temperature). We then run our quantum algorithm and measure. Next, we re-initialize the system, apply a certain transformation and re-run the algorithm. We repeat this several times, with a prescribed set of unitary transformations that we apply in each run to the thermal equilibrium state before running the quantum algorithm. At the end, we add up all our results and take the average. It can be shown that these initial “shuffling transformations” can be chosen such that the difference between the thermal state and the pseudo-pure state cancels out.

Quantum gates

Having this result in place, we now need to understand how we can actually realize quantum gates. In the last post, we have already seen that we can use RF-pulses to rotate the state of an NMR qubit on the Bloch sphere, which can be used to realize single qubit gates. But how do we realize two-qubit gates like the CNOT gate? For that purpose, we need some type of interaction between two qubits.

Now, in reality, any two molecules in an NMR probe do of course interact – by their electric and magnetic fields, by direct collision and so forth. It turns out that in most circumstances, all but one type of coupling called the J-coupling can be neglected. This coupling is an indirect coupling – the magnetic moment created by a nuclear spin interacts with the electric field of an electron, which in turn interacts with the magnetic moment of a different nuclear spin. In the Hamiltonian – in a rotating frame – the J-coupling contributes a term like

$\frac{2 \pi}{\hbar} J_{12} I^1_z I^2_z$

where J12 is a coupling constant. This term introduces a correlation between the two nuclear spins, similar to an additional magnetic field depending on the state of the second qubit. The diagram below is the result of a simulation of an initial state to which J-coupling is applied and demonstrates that the J-coupling manifests itself in a splitting of peaks in an NRM diagram.

An NMR diagram is the result of a Fourier transform, so an additional peak corresponds to a slow, additional rotation of the two spin polarizations around each other.

Let us take a closer look at this. If we apply a free evolution under the influence of J-coupling over some time t, it can be shown – again I refer to my notes on GitHub for the full math – that the time evolution is given by the operator

$U = \cos (\frac{\pi J_{12}}{2} t) -i \sigma_z^1 \sigma_z^2 \sin (\frac{\pi J_{12}}{2} t)$

If we choose t such that

$\frac{\pi J_{12}}{2} t = \frac{\pi}{4}$

and combine this time evolution with rotations around the z-axis of both qubits, we obtain the following transformation, expressed as a matrix

$R_{z^2}(-\frac{\pi}{2}) R_{z^1}(-\frac{\pi}{2}) U = \frac{1+i}{\sqrt{2}} \begin{pmatrix} 1 & & & \\ & 1 & & \\ & & 1 & \\ & & & -1 \end{pmatrix}$

Technically, this transformation is the result of letting the system evolve under the influence of J-coupling for the time t and then applying two sharp RF pulses, one at resonance with the first qubit and one at resonance with the second qubit, timed such that they correspond to a rotation on the Bloch sphere around the z-axis with angle $\pi / 2$. The matrix on the right hand side of this equation represents a two-qubit transformation known as the controlled phase gate. This gate corresponds to applying a phase gate to the second qubit, conditional on the state of the first qubit.

This already looks very similar to a CNOT gate, and in fact it is – one can easily show that the phase gate is equivalent to the CNOT gate up to single qubit operations, more precisely these two gates are related by

$C_{NOT} = (I \otimes H) C_{PHASE} (I \otimes H)$

where I is the identity and H is the Hadamard gate. As single qubit operations can be realized by RF pulses, this shows that a CNOT gate can be realized by a free evolution under the J-coupling, framed by sequences of RF pulses. This, together with single qubit gates, gives us a universal gate set.

This completes our short deep dive into NMR based quantum computing. Historically, NMR based quantum computers were among the first fully functional implementations of (non error-corrected) universal quantum computers, but have come a bit out of fashion in favor of other technologies. At the time of writing, most of the large technology players focus on a different approach – superconducting qubits – which I will cover in the next couple of posts on quantum computing.

Single qubit NMR based quantum computation

In the previous post, we have sketched the basic ideas behind NMR based quantum computation. In this post, we will discuss single qubits and single qubit operations in more depth.

The rotating frame of reference

In NMR based quantum computing, quantum gates are realized by applying oscillating magnetic fields to our probe. As an oscillating field is time dependent, the Hamiltonian will be time dependent as well, making some calculations a bit more difficult. To avoid this, it is useful to pass to a different frame of reference, called the rotating frame of reference.

To explain this, let us first study a more general setting. Assume that we are looking at a quantum system with Hamiltonian H and a state vector $|\psi \rangle$. Suppose further that we are given a unitary group, i.e. a time-dependent unitary operator

$T(t) = e^{-iAt}$

with a hermitian matrix A. We can then consider the transformed vector

$|\tilde{\psi}\rangle = T(t) |\psi \rangle$

Using the product rule and the SchrÃ¶dinger equation, we can easily calculate the time derivative of this vector and obtain

$i \hbar \frac{d}{dt} |\tilde{\psi}\rangle = \hbar A |\tilde{\psi} \rangle + T(t) H |\psi\rangle = \tilde{H} |\tilde{\psi}\rangle$

with

$\tilde{H} = THT^* + \hbar A$

In other words, the transformed vector again evolves over time according to an equation which is formally a SchrÃ¶dinger equation if we replace the original Hamiltonian by the transformed Hamiltonian $\tilde{H}$.

Let us now apply this to the system describing a single nuclear spin with spin 1/2 in a constant magnetic field B along the z-axis of the laboratory system. In the laboratory frame, the Hamiltonian is then given by

$H = \omega I_z$

with the Larmor frequency $\omega$ and the spin operator

$I_z = \frac{\hbar}{2} \sigma_z$

We now pass into a new frame of reference by applying the transformation

$T(t) = \exp ( \frac{i\omega_{ref}}{\hbar} t I_z )$

with an arbitratily chosen reference frequency $\omega_{ref}$. Geometrically, this is a rotation around the z-axis by the angle $\omega_{ref}t$. Using the formula above and the fact that T commutes with the original Hamiltonian, we find that the transformed Hamiltonian is

$\tilde{H} = (\omega - \omega_{ref}) I_z$

This is of the same form as the original Hamiltonian, with a corrected Larmor frequency

$\Omega = \omega - \omega_{ref}$

In particular, the Hamiltonian is trivial if the reference frequency is equal to the Larmor frequency.

Intuitively, this is easy to understand. We know that the time evolution in the laboratory frame is described by a precession with the frequency $\omega$. When we choose $\omega_{ref} = \omega$, we place ourselves in a frame of reference rotating with the same frequency around the z-axis. In this rotating frame, the state vector will be constant, corresponding to the fact that the new Hamiltonian vanishes. If we choose a reference frequency different from the Larmor frequency, we will observe a precession with the frequency $\Omega$.

Let us now repeat this for a different Hamiltonian – the Hamiltonian that governs the time evolution in the presence of an oscillating magnetic field. More precisely, we will look at the Hamiltonian of a rotating magnetic field (which is a good approximation for an oscillating magnetic field by an argument known as rotating wave approximation, see my notes for more details on this). In the presence of such a field, the Hamiltonian in the laboratory frame is

$H = \omega I_z + \omega_{nut} [I_x \cos (\omega_{ref}t + \Phi_p) + I_y \sin (\omega_{ref}t + \Phi_p) ]$

To calculate the Hamiltonian in the rotating frame, we have – according to the above formula – to apply the conjugation with T to each of the terms appearing in this Hamiltonian and add a correction term.

Now the transformation T is a rotation around the z-axis, and the result of applying a rotation around the z-axis to the operators Ix and Iy is well known and in fact easy to calculate using the commutation relations between the Pauli matrices. The correction term cancels the first term of the Hamiltonian as above. The transformed Hamiltonian is then given by

$\tilde{H} = \omega_{nut} [I_x \cos \Phi_p + I_y \sin \Phi_p ]$

In other words, the time dependence has disappeared and only the phase term remains. Again, this is not really surprising – if we look at a rotating magnetic field from a frame of reference that is rotating around the same axis with the same frequency, the result is a constant magnetic field.

The density matrix of a single qubit

We are now ready to formally describe a single qubit, given by all nuclei at a specific position in a system consisting of a large number of molecules. According to the formalism of statistical quantum mechanics, this ensemble is described by a 2×2 density matrix $\rho$. The time evolution of this density matrix is governed by the Liouville-von Neumann equation

$i\hbar \frac{d}{dt} \rho = [H,\rho]$

The density matrix is a hermitian matrix with trace 1. Therefore the matrix $\rho - \frac{1}{2}$ is a traceless hermitian matrix. Any such matrix can be expressed as a linear combination of the Pauli matrices with real coefficients. Consequently, we can write

$\rho(t) = \frac{1}{2} + f(t) \cdot I$

where f is a three-vector with real coefficients and the dot product is a shorthand notation for

$f \cdot I = f_x I_x + f_y I_y + f_z I_z$

Similarly, the most general time-independent Hamiltonian can be written as

$H = \frac{1}{2} tr(H) + a \cdot I$

We can remove the trace by adding a constant, which does not change the physics and corresponds to a shift of the energy scale. Further, we can express the vector a as the product of a unit vector and a scalar. Thus the most general Hamiltonian we need to consider is

$H = \omega_{eff} n \cdot I$

with a real number $\omega_{eff}$ (the reason for this notation will become apparent in a second) and a unit vector n.

Let us now plug this into the Liouville equation. Applying the very useful general identity (which is easily proved by a direct calculation)

$[a\cdot I, b \cdot I] = i \hbar (a \times b) \cdot I$

for any two vectors a and b, we find that

$\dot{f} = - \omega_{eff} [f \times n]$

This equation is often called the Bloch equation. By splitting f into a component perpendicular to n and a component parallel to n, one can easily see that the solution is a rotation around the axis n with frequency $\omega_{eff}$.

What is the physical interpretation of this result and the physical meaning of f? To see this, let us calculate the expectation value of the magnetic moment induced by the spin of our system. The x-component of the magnetic moment, for instance, corresponds to the observable $\gamma I_x$. Therefore, according to the density matrix formalism, the expectation value of the x-component of the magnetic moment $\mu$ is

$\langle \mu_x \rangle = \gamma tr (\rho I_x)$

If we compute the matrix product $\rho I_x$ and use the fact that the trace of a product of two different Pauli matrices is zero, we find that the only term that contributes to the trace is the term fx, i.e.

$\langle \mu_x \rangle = \frac{\gamma \hbar^2}{4} f_x$

Similar calculations work for the other components and we find that, up to a constant, the vector f is the net magnetic moment of the probe.

A typical NMR experiment

After all these preparations, we now have all tools at our disposal to model the course of events during a typical NMR experiment in terms of the density matrix.

Let us first try to understand the initial state of the system. In a real world experiment, none of the qubits is fully isolated. In addition, the qubits interact, and they interact with the surroundings. We can model these interactions by treating the qubits as being in contact with a heat bath of constant temperature T. According to the rules of quantum statistical mechanics, the equilibrium state, i.e. the state into which the system settles down after some time, is given by the Boltzmann distribution, i.e.

$\rho(t=0) = \frac{1}{Z} \exp (- \frac{H}{kT})$

In the absence of an additional rotating field, the Hamiltonian in the laboratory frame is given by

$H = \omega I_z$

Therefore

$\rho(t=0) = \frac{1}{Z} \exp ( \frac{-\omega}{kT} I_z)$

Using the relations

$I_z = \frac{\hbar}{2} \sigma_z$

and

$\omega = -\gamma B \sigma_z$

with the gyromagnetic moment $\gamma$, we can write this as

$\rho(t=0) = \frac{1}{Z} \exp ( \frac{1}{2} {{\hbar \gamma B}\over{kT}} \sigma_z)$

Let us now introduce the energy ratio

$\beta = \frac{\hbar \gamma B}{kT}$

The energy in the numerator is the energy scale associated with the Larmor frequency. For a proton in a magnetic field of a few Tesla, for example, this will be in the order of 10-25 Joule. The energy in the denominator is the thermal energy. If the experiment is conducted at room temperature, say T=300K, then this energy is in the order of 10-21 Joule (see this notebook for some calculations). This yields a value of roughly 10-4 for $\beta$. If we calculate the exponential in the Boltzmann distribution by expanding into a power series, we can therefore neglect all terms except the constant term and the term linear in $\beta$. This gives the approximation

$\rho(t=0) = \frac{1}{Z} (1 + \frac{1}{2} \beta \sigma_z)$

called the high temperature approximation. We can determine the value of Z by calculating the trace and find that Z = 2, so that

$\rho(t=0) = \frac{1}{2} + \frac{1}{4} \beta \sigma_z = \begin{pmatrix} \frac{1}{2} + \frac{\beta}{4} & 0 \\ 0 & \frac{1}{2} - \frac{\beta}{4} \end{pmatrix} = \frac{1}{2} + \frac{1}{2} \frac{\beta}{\hbar} I_z$

If we compare this to the general form of the density matrix discussed above, we find that the thermal state has a net magnetization in the direction of the z-axis (for positive $\beta$). This is what we expect – with our sign conventions, the energy is lowest if the spin axis is in the direction of the z-axis, so that slightly more nuclei will have their spins oriented in this direction, leading to a net magnetic moment.

To calculate how this state changes over time, we again pass to the rotating frame of reference. As the initial density matrix clearly commutes with the rotation around the z-axis, the density matrix in the rotating frame is the same. If we choose the reference frequency to be exactly the Larmor frequency, the Hamiltonian given by the static magnetic field along the z-axis vanishes, and the density matrix does not change over time.

When, however, we apply an additional pulse, i.e. an additional rotating magnetic field, for some time $\tau$, this changes. We have already seen that in the rotating frame, this pulse adds an additional term

$\omega_{nut} (I_x \cos \Phi_p + I_y \sin \Phi_p)$

to the Hamiltonian. This has the form discussed above – a scalar times a dot product of a unit vector with the vector I = (Ix, Iy, Iz). Therefore, we find that the time evolution induced by this Hamiltonian is a rotation around the vector

$(\cos \Phi_p, \sin \Phi_p, 0)$

with the frequency $\omega_{nut}$. If, for instance, we choose $\Phi_p = 0$, the vector f – and thus the magnetic moment – will slowly rotate around the x-axis. If we turn off the pulse after a time $\tau$ such that

$\omega_{nut} \tau = \frac{\pi}{2}$

the net magnetization will be parallel to the y-axis. After the pulse has been turned off, the density matrix in the rotating frame is again constant, so the magnetic moment stays along the y-axis. In the laboratory frame, however, this corresponds to a magnetic moment which rotates with the Larmor frequency around the z-axis. This magnetic moment will induce a voltage in a coil placed in the x-y-plane which can be measured. The result will be an oscillating current, with frequency equal to the Larmor frequency. Over time, the state will slowly return into the thermal equilibrium, resulting in a decay of the oscillation.

This is a good point in time to visualize what is happening. Given the explicit formulas for the density matrix derived above, it is not difficult to numerically simulate the state changes and NMR signals during an experiment as the one that we have just described (if you want to take a look at the required code, you can find a Python notebook here)

The diagram below shows the result of such a simulation. Here, we have simulated a carbon nucleus in a TCE (Trichloroethylene) molecule. This molecule – pictured below (source Wikipedia) – has two central carbon nuclei.

A small percentage of all TCE molecules in a probe will have two 13C nuclei instead of the more common 12C nuclei, which have spin 1/2 and therefore should be visible in the NMR spectrum. At 11.74 Tesla, an isolated 13C carbon nucleus has a Larmor precession frequency of 125 MHz. However, when the nuclei are part of a larger molecule as in our case, each nucleus is shielded from an external magnetic field by the surrounding cloud of electrons. As the electron configuration for both nuclei is different, the observed Larmor frequencies differ by a small amount known as the chemical shift.

At the start of the simulation, the system was put into a thermal state. Then, an RF pulse was applied to flip the magnetization in the direction of the x-axis, and then a sample was taken over 0.1 seconds, resulting in the following signal.

The signal that we see looks at the first glance as expected. We see an oscillating signal with an amplitude that is slowly decaying. However, you might notice that the frequency of the oscillation is clearly not 125 MHz. Instead, the period is roughly 0.001 seconds, corresponding to a frequency of 1200 Hz. The reason for this is that in an NMR spectrometer, the circuit processing the received signal will typically apply a combination of a mixer and a low pass filter to effectively shift the frequency by an adjustable reference frequency. In our case, the reference frequency was adjusted to be 1200 Hz above the Larmor frequency of the carbon nucleus, so the signal will oscillate with a frequency of 1200 Hz. In practice, the reference frequency determines a window in the frequency space in which we can detect signals, and all frequencies outside this window will be suppressed by the low pass filter.

Now let us take a look at a more complicated signal. We again place the system in the thermal equilibrium state first, but then apply RF pulses to flip the spin of both carbon nuclei into the x-axis (in a simulation, this is easy, in a real experiment, this requires some thought, as the Larmor frequencies of these two carbon nuclei differ only by a small chemical shift of 900 Hz). We then again take a sample and plot the signal. The result is shown below.

This time, we see a superposition of two oscillations. The first oscillations is what we have already seen – an oscillation with 1200 Hz, which is the difference of the chosen reference frequency and the Larmor frequency of the first carbon. The second oscillation corresponds to a frequency of roughly 300 Hz. This is the signal caused by the Larmor precession of the second spin. As we again measure the difference between the real frequency and the reference frequency, we can conclude that this frequency differs from the Larmor frequency of the first spin by 900 Hz.

In reality, the signal that we observe is the superposition of many different oscillations and is not easy to interpret – even with a few oscillations, it soon becomes impossible to extract the frequencies by a graphical analysis as we have done it so far. Instead, one usually digitizes the signal and applies a Fourier transform (or, more precisely, a discrete Fourier transform). The following diagram shows the result of applying such a DFT to the signal above.

Here, we have shifted the x-axis by the difference between the Larmor frequency $\omega_0$ of the first nucleus and the reference frequency, so that the value zero corresponds to $\omega_0$. We clearly see two peaks. The first peak at zero, i.e. at Larmor frequency $\omega_0$, is the signal emitted by the first nucleus. The second peak is shifted by 900 Hz and is emitted by the second nucleus. In general, each nucleus will result in one peak (ignoring couplings that we will study in a later post) and the differences between the peaks belonging to nuclei of the same isotope are the chemical shifts.

Let us quickly summarize what we have learned. An ensemble of spin systems is described by a density which in turn is given by the net magnetization vector f. The result of applying a pulse to this state is a rotation around an axis given by the phase of the pulse (in fact, the phase can be adjusted to rotate around any axis in the x-y-plane, and as any rotation can be written as a decomposition of such rotations, we can generate an arbitrary rotation). The net magnetization can be measured by placing a coil close to the probe and measuring the induced voltage.

This does already look like we are able to produce a reasonable single qubit. The vector f appears to correspond – after some suitable normalization – to points on the Bloch sphere, and as we can realize rotations, we should be able to realize arbitrary single qubit quantum gates. But what about multiple qubits? Of course a molecule typically has more than one nucleus, and we could try to use additional nuclei to create additional qubits, but there is a problem – in order to realize multi-qubit gates, these qubits have to interact. In addition, we need to be able to prepare our NMR system in a useful initial state and, at the end of the computation, we need to measure the outcome. These main ingredients of NMR based quantum computing will be the subject of the next post.

Quantum error correction: the surface code

In my previous post on quantum error correction, we have looked at the toric code which is designed for a rather theoretical case – a grid of qubits on a torus. In reality, qubits are more likely to be arranged in a planar geometry. Luckily, a version of the toric codes that works well in a planar geometry exists – the surface code.

Planar codes and their stabilizers

Recall that the starting point for the toric code was a lattice L with periodic boundary conditions so that the geometry of the lattice becomes toroidal, which gave us a four-dimensional code space.

If we want to use this code in planar setting, it is clear that we somehow have to modify the boundary conditions. What happens if we simply drop it? A short calculation using a bit of algebraic topology shows that in this case, the code space collapses to a one-dimensional code space which is not enough to hold a logical qubit. So we need some a more sophisticated boundary condition. The solution found in [1] is to use two types of boundaries.

Look at the diagram above. The solid lines form a lattice L with two types of boundary – a “smooth” boundary at the top and the bottom and a “rough boundary” left and right (ignore the dashed lines for a moment, we will explain their meaning in a second). We again place qubits on the edges of the lattice, but for the boundary, we only place qubits (again indicated by black dots) on the edges which are part of the smooth boundary.

As for the case of a toric code, we can again create a second lattice called the dual lattice whose vertices are the center points of the faces of L and whose edges are perpendicular to the edges of L. This dual lattice is indicated by the dashed lines in our diagram. Again, some edges of the dual lattice carry qubits and others do not, so we have a smooth and a rough boundary as well (note that the smooth boundary of L gives raise to the rough boundary of the dual lattice and vice versa).

Again a path along the edges, i.e. a set of edges, is an element of a group, the group of one chains. If we only allow the edges that are not part of the rough boundary, i.e. the edges that carry qubits, we obtain a subgroup which is known to topologists as the group of one chains relative to the rough boundary – again I will strive to keep this post free from too much algebraic topology and refer the reader to my notes for more details and a more precise description.

As for the toric code, every such relative one-chain c gives raise to a operators Xc and Zc, obtained by letting X respectively Z act on any qubit crossed by the chain. Similarly, a co-chain is a path in the dual lattice, i.e. a path perpendicular to the edges of the lattice L.

In the literature, the lattice and its dual are typically visualized differently, namely with all edges that do not carry qubits removed, as in the following diagram.

Here we have removed the rough boundary at the left and right of the lattice and the same for the dual lattice. The red line is a relative one chain. Note that this one-chain is “open-ended”. Its boundary would be in the (removed) rough boundary. In the relative version of the chain complex, its boundary is zero, hence this is the equivalent to a closed loop in the toric geometry. Similarly, the blue line represents a co-chain, i.e. a chain in the dual lattice, which has zero boundary in the dual lattice.

We can now proceed as for the toric code. For each vertex v, there is a vertex operator Xv, which is the product of the Pauli X operators acting on the qubits sitting on the edges touching the vertex (which are four qubits in the interior of the lattice and three qubits if the vertex is sitting on the smooth boundary). For each face f, we again have a face operator Zf, acting as a Pauli Z operator on the four qubits sitting on the boundary edges of the face – again, if the face is located at the rough boundary, there are only three operators. These operators again commute with each other and create an abelian subgroup S. The code space TS created by this group, i.e. the space of all states left invariant by all face operators and all vertex operators, now turns out to be two-dimensional and is thus encoding one logical qubit. The code obtained in this way is called the surface code.

Again, there are logical Pauli operators. In fact, these are the operators created by the chains marked in the diagram above – the Z operator Zc associated with the one chain c and the X operator Xc* associated with the co-chain c*. The fact that these operators commute with each operator in S is related to their boundaries being zero (in the sense described above, i.e. the are open-ended) and the fact that they anti-commute is expressed geometrically by the fact that they intersect exactly once.

Thus we obtain a logical qubit and logical Pauli operators that act on this qubit. What about measurements? As for the toric code, we can measure our (data) bits located at the edges by adding measurement qubits. To measure a face operator Zf, we can place an additional measurement qubit in the center of the face and use this qubit as an ancilla in our measurement circuit below.

Similarly, to measure the vertex operator Xv for a vertex v, we place an additional measurement qubit at the vertex and use a similar circuit to transfer the result of the measurement into this qubit, as we did it for the toric code. The only difference to the case of a toric code is again that if a vertex or face is adjacent to only three qubits as it is located close to the boundary, we only include those qubits in our measurement. We therefore obtain the following arrangement of qubits, were black dots are data qubits, white dots are ancilla qubits used for the measurements and a letter X or Z indicates the action of a Pauli operator (this is, up to a switch of black and white, the graphical representation of a surface code used in [2]).

For instance, a white dot surrounded by Z’s indicates a face and the corresponding face operator Zf, together with the black data qubits on which the face operator acts. At the same time, the white dot is the location of a measurement qubit used to measure Zf. Similarly, a white dot surrounded by X’s indicates both a vertex operator and a data qubit to measure the vertex operator.

Of qubits and holes

So far we have seen how the surface code can be used to encode a single logical qubit in a whole array of physical qubits. In practice, however, we need more than one qubit. We could of course try to arrange the planar surfaces in a stack, with qubits on each plane interacting with the qubits above and below, and then use some variant of quantum teleportation to establish connectivity between the logical qubits. However, there is a different way – we can realize more than one logical qubit on one array of physical qubits (this is only one option – as of today, there are many different approaches built upon similar ideas, like color codes or lattice surgery – see [5] and [6] for a discussion of some alternatives).

The idea is simple. We have obtained our code space as the intersection of the subspaces given by constraints, where each face operator and each vertex operator add one (linear) constraint. To obtain a larger code space, all we have to do is to drop a few of these constraints again!

To understand how this works, we first have to understand how the surface code is usually used in practice ([2]). We first put the system into an initial, potentially unknown state. We then perform a series of measurements of all stabilizers. This will force the state into a simultaneous eigenstate of all the stabilizer operators.

Now, the eigenvalues will typically not all be one. We therefore have to apply corrections to move our state into the code space (or we could as well keep track of the error syndrome and perform the corrections in software when we measure) – this is exactly the same process as if we had found errors during the later computation. In other words, we do a first round of error correction.

We now start our quantum algorithm. After each step of the algorithm, we perform another round of measurements of the stabilizers to detect any errors that might have occurred. Again, we could now correct the errors, or alternatively keep track of them as well and correct the final measurements if needed. In this way, a computation with a surface code is essentially a periodic measurement of all stabilizers in repeating cycles followed by an (actual or virtual) error correction, and logical operations executed between any two subsequent cycles.

Now let us see what happens when we modify our code by removing a face operator Zf from the set of stabilizers. Thus we perform some initial cycles to put the system into a ground state $|g \rangle$ (here and in the sequel, we assume that errors are actually corrected and not just detected to simplify our calculations) on which all stabilizers act trivially, and starting with the next round of measurements, we exclude a Zf operator from the measurements.

Of course, this will enlarge the code space. Specifically, let us take a look at the upper left corner of the figure above. Here we have marked a face f and an operator Xc* coming from a string connecting this face with the rough boundary. Then the boundary of c* consists of one point in the dual lattice (the center of f), and hence every face operator commutes with Xc*, except Zf which anti-commutes. If we now define our code space to be the space fixed by all the vertex operators Xv and all face operators except Zf, then Xc* and Zf map that code space to itself. As c* intersects the boundary of f in one point, Zf and Xc* anti-commute and therefore define a set of logical Pauli operators, acting on the code space. Thus we have constructed a logical qubit!

Intuitively, removing the face operator Zf from the set of stabilizers corresponds to poking a “hole” into the surface, i.e. removing the face f and the data qubit in it. But of course we do not change the physical layout of our device at all – all we do is changing the code space and the set of logical Pauli operators. Our original code space, by the way, still exists, but we will tacitly ignore it and work with our new logical qubit.

This construction has a nice physical interpretation in terms of the particle picture that we have introduced in the post on the toric code. Suppose we create a pair of quasi-particles, supported on f and a second face close to the rough boundary. We have seen that we can move these particles by acting with edge operators Xe on them. Thus, we can move the second face on which the particle lives across the rough boundary and now obtain a particle supported on one face only. This is exactly the configuration which is obtained by acting with the logical Pauli X operator of our newly created logical qubit on the ground state, i.e. these logical qubits correspond to pairs of particles where one particle has been “pushed off” the surface!

As usual, this construction has a counterpart in the dual lattice, as shown on the right of of our diagram. The configuration marked in blue consists of a vertex operator Xv and a string c connecting that vertex to the smooth boundary. We can now remove Xv from the stabilizer, and obtain a pair of logical operators Xv and Zc. This type of logical qubit is called a single X-cut in [2], whereas the first type of qubit considered is a single Z-cut – other authors use the term defects for these logical qubits.

This technique allows us in theory to encode a large number of logical qubits in one surface. However, in practice, the restriction that we need to reach the boundary from each of the faces and vertices that we switch off is restricting our layout options a bit. Fortunately, it turns out that this is not even necessary.

In fact, coming back to the analogy of the quasi-particles, nobody forces us to move the second particle off the surface. To see this, consider the configuration marked in green in the lower part of the diagram above. Here we have removed two stabilizers from the code, the face operator Zf corresponding to the face on the left hand side, and the face operator Zf’ corresponding to the face on the right hand side. This will enlarge our code space by a space of dimension four. However, we now gracefully ignore half of that space and instead work with the space spanned by the Pauli operators Zf and Xc*.

A logical qubit obtained by this construction is called a double Z-cut. Of course, we can again repeat this construction in the dual lattice and obtain a double X-cut (these cuts are also for vegetarians…).

Let’s do the twist – moving and braiding qubits

Again, it is time to recap what we have done so far. We have seen that we can create logical qubits by removing face and vertex operators from the set of stabilizers. We have also seen that we can identify logical Pauli X and Z operators acting on these qubits. We can than, of course, also realize a logical Y as the product ZX. What about other gates? And specifically, what about multi-qubit gates?

It turns out that multi-qubit gates can be realized by moving logical qubits around the surface. To make this term a bit more precise, take a look at the following diagram.

Here we consider a logical qubit described by the Pauli operators Xc* and Zf, i.e the face operator Zf has been removed from the set of stabilizers. Adjacent to the face f, there is a face that we denote by $\bar{f}$. By removing $Z_{\bar{f}}$ (and adding Zf again to the set of stabilizers), we would obtain a different code space – we can think of this different code space as a deformation of the code space given by removing Zf.

We want to describe a process that starts with a state in the code space given by Zf and ends with a state in the code space given by $Z_{\bar{f}}$. To do this, we manipulate the original state in several steps. First, we measure the Pauli X operator Xe for the physical qubit sitting on the edge e of the dual lattice that connects the centre of f with that of $\bar{f}$ – this is the qubit on which both $Z_{\bar{f}}$ and Zf act. Let us assume for a moment that the outcome of this measurement is one. Next, we do another measurement – this time we measure Zf. This will force the system into an eigenstate of Zf. Let us assume again that the outcome of this measurement is plus one, so that the system is now in the new code space obtained by removing $Z_{\bar{f}}$ instead of Zf from the stabilizer.

This procedure gives us a mechanism to transform a state in the original code space into a state in the new code space. It is not difficult to see (again I refer to my notes for all the details) that this leaves the logical meaning of the state untouched, i.e. this transformation maps a logical one in the old codespace to a logical one in the new code space and the same for a logical zero (in fact, it turns out that we need to modify the Pauli operators a bit for this to work).

Of course we cannot directly compare the states before and after the transformation as they live in different code spaces. That, however, changes if we move a logical state around like this along a closed loop. Then we will obtain a new state which is living in the original code space and can compare it directly to the original state, as in the diagram below.

This diagram shows an example where we move a logical qubit generated by a Z-cut once around a logical qubit generated by an X-cut. If we hit upon the intersection between the loop and the X-cut, we pick up an additional transformation. This is a bit like a coordinate transformation – while we move along the loop, we constantly have to adjust our coordinate system, i.e. the logical Pauli operators, to preserve the logical meaning of the qubit. When we reach the initial position, we now have two coordinate system that we can compare, and we find that we have picked up a non-trivial transformation related to the fact that we cannot contract the loop without passing the hole in the surface creating our logical qubit. If you go through all the transformations carefully (or read my notes where I have tried to work out all the details – it is easy to get confused at this point), you will find that the transformation we pick up constitutes a CNOT gate between the two involved qubits! Thus a CNOT can be realized by moving one logical qubit around the second logical qubit once. This is often visualized by a two-strand braid, as illustrated below.

We have also seen that the transformation given by such a braiding operation is governed by terms like intersection numbers and contracting loop, i.e. the transformation is topological in nature – a slightly deformed braid or path gives the same result. Therefore this operation has a certain natural protection against faults, it is topologically protected. Unfortunately, the surface code does not allow a representation of a universal set of quantum gates by braids – doing this requires the use of a more sophisticated physical equivalent called non-abelian anyons. For the surface code, for instance, the T gate cannot be represented by topological operations and needs to be implemented using a technique called gate teleportation.

We will not go into details on this, but try to quickly describe the basic idea which is based on the observation that the circuit

can be used to realize the T gate.

How does this circuit work? The upper qubit is an ancilla qubit that is initialized with the specific state

$|0\rangle + e^{i \frac{pi}{4}} |1\rangle$

by a process called magic state distillation (see [2] for details). Now a short calculation shows that after applying the CNOT gate, the combined system of both qubits will be in the state

$T|\psi \rangle \otimes |0 \rangle + (TX |\psi \rangle) \otimes |1 \rangle$

Thus we have successfully transferred the state $|\psi \rangle$ into the first qubit (this is why this and similar circuits are called gate teleportation circuits), but still have a superposition. To remove this superposition, we nown perform a measurement MZ on the second qubit. If the outcome of this measurement is one, the first qubit will be in the state $T|\psi \rangle$ and we are done. If not, i.e. if the measurement results in minus one, then the above formula shows that the resulting state in the first qubit will be $TX |\psi \rangle$. Now one can easily verify that

$TX = X T^*$

and

$T = S T^*$

so that if we apply the sequence SX to the first qubit conditioned on the outcome of the measurement, as indicated in our circuit diagram, the result will in both cases be $T|\psi \rangle$.

By now, our discussion should have made it clear that the surface codes create a significant overhead. To illustrate this, the appendix of [2] contains an estimation of the number of physical qubits needed to implement Shor’s factoring algorithm for a 2048 bit RSA key. Still, most operations on a surface code are topologically protected, which makes surface codes rather robust. Therefore surface codes remain a promising technology to implement universal fault-tolerant quantum computation, and are one of the most active areas of current research in applied quantum computing.

References

1. S. Bravyi, A. Kitaev, Quantum codes on a lattice with boundary, arXiv:quant-ph/9811052
2. A.G. Fowler, M. Mariantoni, J.M. Martinis, A.N. Cleland, Surface codes: Towards practical large-scale quantum computation, arXiv:1208.0928v2
3. A.G. Fowler, A.M. Stephens, P. Groszkowski, High threshold universal quantum computation on the surface code, arXiv:0803.0272
4. E. Dennis, A. Kitaev, A. Landahl, J. Preskill, Topological quantum memory, arXiv:quant-ph/0110143
5. T.J. Yoder, I.H. Kim, The surface code with a twist, Quantum Vol. 1, 2017 (available online)
6. A. Javadi-Abhari et. al., Optimized Surface Code Communication in Superconducting Quantum Computers, arXiv:1708.09283

Â

Quantum phase estimation – the quantum algorithm Swiss army knife

When you are faced with a problem in linear algebra and have absolutely no idea what to do, an eigenvalue decomposition is the one thing that you would typically try first. In the world of quantum algorithms, the situation is similar – finding the eigenvalues of a matrix is a central building block of many quantum algorithm. The most important approach to finding the eigenvalues is called the quantum phase estimation algorithm.

The quantum phase estimation algorithm

In its most basic form, the quantum phase algorithm is used to find an eigenvalue of a unitary matrix assuming that we know an eigenvector (we will see later that this is in fact not necessarily needed for useful applications). So let us assume that we are given a unitary matrix U, implemented as a quantum circuit, and a quantum state $|\psi \rangle$ which is an eigenvector of U. As U is unitary, all eigenvalues are on the complex unit circle, and therefore the eigenvalue equation is

$U |\psi \rangle = e^{2\pi i \varphi} |\psi \rangle$

for some number $\varphi$ between zero and one. The quantum phase estimation is designed to determine (or at least approximate) the number $\varphi$, i.e. the phase of the eigenvalue (hence the name).

The algorithm consists of two parts. Both parts operate on an ancillary working register and a register that we call the primary register. Let n denote the number of qubits in the primary register and m denote the number of qubits in the working register. We assume that the initial state in the primary register is the state $|\psi \rangle$. The first step of the algorithm is to prepare the system in the state

$\frac{1}{\sqrt{2^m}} \sum_{k=0}^{2^m - 1} |k \rangle U^k |\psi \rangle$

How can we prepare this state? Given a state $|k \rangle$, with binary digits ki, we can of course write Uk as

$U^k = \prod_i \big[ U^{2^i} \big]^{k_i}$

In other words, Uk is a sequence of conditional gates U2i, conditioned on the i-th bit of k. Together with the usual approach to prepare an equal superposition of all $|k \rangle$ by using Hadamard gates, this argument shows that the desired state can be prepared by the following quantum circuit.

So far we have not yet used that the initial state $|\psi \rangle$ is an eigenstate. Let us do this now. We know that a power of U will act on $|\psi \rangle$ as multiplication by the corresponding power of the eigenvalue, hence

$\frac{1}{\sqrt{2^m}} \sum_{k=0}^{2^m - 1} |k \rangle U^k |\psi \rangle = \frac{1}{\sqrt{2^m}} \big[\sum_{k=0}^{2^m - 1} e^{2\pi i \varphi k} |k \rangle \big] \otimes |\psi \rangle$

This is interesting – our state turns out to be a product state (as the multiplication by a scalar can be pulled through the tensor product). To understand the first part of the product (i.e. the resulting state in the working register) in more detail, let us assume for a moment that the phase $\varphi$ is an exact multiple of 2-m, i.e. that

$\varphi = \frac{t}{2^m}$

for some integer t. Then the state in the working register is

$\frac{1}{\sqrt{2^m}} \sum_{k=0}^{2^m - 1} e^{2\pi i \frac{tk}{2^m}} |k \rangle$

But this is simply the inverse quantum Fourier transform applied to the state $|t \rangle$! So we can obtain t and therefore $\varphi$ by applying a quantum Fourier transform to the state in the working register and doing a measurement. This is the quantum phase estimation algorithm.

In most cases, the phase will of course not be an exact multiple of t / 2m. However, with a bit more work (see for instance arXiv:quant-ph/9708016 or my own notes) one can see that the our result is still approximately true, i.e. if we run this algorithm, measure the value in the working register and divide by 2m, we get a good approximation to the actual phase $\varphi$. The precision of the estimation depends on the number of qubits m that we have in our working register. The following diagram demonstrates this for a specific example for two different values of m.

The x-axis shows the absolute deviation from the true value, the y-axis shows the probability for this deviation. The upper image corresponds to m=3. We see that there is a peak around zero, but still a substantial probability to have a significant error. The lower diagram shows the outcome for m=8. Here we see that the peak is very sharp, and so we obtain a very good approximation with high probability.

Shor’s algorithm as an instance of quantum phase estimation

Most of the time, the quantum phase estimation algorithm is used as a subroutine in a more general algorithm. A prominent example is Shor’s algorithm which – as we will now see – can be expressed as an application of the quantum phase estimation.

Recall that the quantum part of Shor’s algorithm is concerned with finding the period of a number a modulo M, where M is some large integer that we want to factor. Here we assume that a is a unit modulo M so that one of its powers is one. As a is a unit, the mapping

$U \colon |x \rangle \mapsto |ax \mod M\rangle$

is a unitary transformation U for x < M (which we can extend by the identity to a unitary transformation of the full Hilbert space). If r denotes the period of U, then clearly the fact that ar = 1 modulo M implies that Ur = 1. Thus all eigenvalues of U are r-th roots of unity. In other words, if an eigenvalue has phase $\varphi$, then $\varphi r$ is an integer. We can therefore use the phase estimation algorithm to obtain the period r.

There is one twist, however. To be able to apply the phase estimation algorithm, we need an eigenvector as an initial state, and it is far from clear how to prepare this. Fortunately, a deeper analysis (again see arXiv:quant-ph/9708016 or my own notes for all the details of this) shows that the state $|1 \rangle$ is a sum of eigenstates. Running the algorithm on this state will – by linearity – produce a state which is a superposition of states corresponds to different multiples of 2m / r, and performing the measurement will choose one of these multiples. As in Shor’s original version of the algorithm, we can then use a continued fraction expansion to obtain the unknown period r. Thus, the following algorithm can be used to determine the period r.

1. Prepare a primary register in the state $|1 \rangle$
2. Apply the circuit shown above
3. Apply a quantum Fourier transform to the working register
4. Perform a measurement of the working register and call the result s – this will be close to a multiple of 2m / r
5. Perform a continued fraction expansion to find the period r as in Shor’s original version of the algorithm

At the first glance, it looks as if we had found an alternative period finding algorithm, but if you actually draw up the circuit that realizes this algorithm, you will find – again see the references cited before – that this is not true, in fact the resulting circuit will be almost identical to the circuit for Shor’s version of the algorithm. Thus we have not found a new algorithm, but have found a different description of the same algorithm – period finding can be described as an instance of the phase estimation algorithm.

More applications

Of course factoring integers is not the only possible application of the quantum phase estimation. An obvious application is finding eigenvalues of Hamiltonians, i.e. energy eigenvalues – and in particular ground state energies – of physical systems. We have already touched on this possible application in a previous post on the quantum variational eigensolver.

Another application is based on the observation that if we start the algorithm with a superposition of eigenvectors, the measurement at the end of the algorithm will not only select an eigenvalue, but also an eigenvector. Thus we can use the algorithm to decompose a state into a superposition of eigenvectors. This can be used to solve linear equations of the form Ax = b for a given matrix A. In fact, if we can diagonalize A, then solving this equation becomes equivalent to division. Implementing this as a quantum algorithm is not as simple and straightforward as this description might suggest, as the division is a non-unitary operation, but it turns out that it can be approximated by a unitary transformation. The resulting algorithm is called the HHL-algorithm after its inventors Harrow, Hassidim and Lloyd, and has been published in this paper on the arxiv.

The HHL algorithm does not guarantee a quantum advantage for every matrix A, as it relies on a way to efficiently calculate the corresponding time evolution operators eiAt. However, if for instance A is sparse and has low condition number (ratio between eigenvalues), the algorithm provides an exponential speedup over the best classical algorithms. An important application is deep machine learning, where efficient matrix inversion plays an important role – see for instance this paper by Zhao, Pozas-Kerstjens, Rebentrost, and Witte, as well as the corresponding source code. The details of the HHL algorithm are far from being trivial, and we will leave a more in-depth discussion to a future post.

Quantum error correction with stabilizer codes

In our previous discussion of quantum error correction, we have assumed that quantum gates can act on any two physical qubits. In reality, however, this is not true – only nearby qubits and interact, and our error correction needs to take the geometric arrangements of the qubits into account. The link between these geometric constraints and the theory of quantum error correction is the stabilizer formalism described first in [2].

This post will be a bit more formal, but is a necessary preparation to be able to define and understand the surface code and other topological codes. To introduce and motivate the formalism, let us for a moment come back to the simple example studied before – the three-bit code. In this code, the logical states were

$|0 \rangle_L = |000 \rangle$

and

$|1 \rangle_L = |111 \rangle$

These logical states span our two-dimensional code space and describe one logical qubit. A single bit flip error will flip one of the three involved qubits. Suppose, for instance, that a bit flip occurs on the first or second qubit. This error will modify our logical states as follows (the first two lines represent a bit flip on the first qubit, the next two lines a bit flip on the second qubit

$|0 \rangle_L \mapsto |100 \rangle$

$|1 \rangle_L \mapsto |011 \rangle$

$|0 \rangle_L \mapsto |010 \rangle$

$|1 \rangle_L \mapsto |101 \rangle$

Now consider the operator

$M_1 = \sigma_z^1 \sigma_z^2$

where the upper index on a Pauli matrix indicates the qubit on which the operator acts. The error-free logical states are both eigenstates of this operator with eigenvalue +1. The states that result out of a bit-flip error are also eigenstates of this operator, but with eigenvalue -1. Thus, a measurement of this operator will tell us whether a bit flip on the first or second qubit has occurred. Similarly, to also detect a bit flip on the third qubit, we need a measurement for a second operator, namely

$M_2 = \sigma_z^1 \sigma_z^3$

Let us now write down a few properties of these operators. First, they are hermitian and therefore correspond to measurements. Second, the logical states are eigenvectors of these operators with eigenvalues +1 and the code space is exactly the subspace of the three-qubit state space that is fixed by M1 and M2. We can think of these operators as defining linear constraints that together define the code space, as indicated below.

Moreover, if a bit flip error occurs, the resulting state will again be an eigenvalue, but for at least one of the operators the new eigenvalue will be -1. Thus, errors correspond to violated constraints. Finally, the two operators commute and can therefore be measured simultaneously.

These properties allow us to express the three bit code entirely in terms of the operators M1 and M2. The code space is the subspace which is left invariant by these operators. The syndrome measurement can be done by measuring both operators simultaneously, which is possible because they commute. Finally, all involved states – code space and states after bit flip errors – are eigenstates and therefore the measurement process does not collapse a superposition of these states and can therefore be executed without interfering with the actual quantum algorithm that we want to run.

This correspondence between sets of operators – the Mi in our case – and quantum error correction is the core of the stabilizer formalism. In a more general setting, we are looking at the Hilbert space of dimension 2n spanned by n physical qubits. Products of n Pauli matrices (where we include the identity matrix) are acting on this Hilbert space and form a group ${\mathcal G}_n$ called the Pauli group. Put differently, the Pauli group consists of those linear operators on the Hilbert space that can be written as a product

$g = \mu (\sigma_x^1)^{a_1} (\sigma_y^1)^{b_1} (\sigma_z^1)^{c_1}\cdots (\sigma_x^n)^{a_n}(\sigma_y^n)^{b_n} (\sigma_z^n)^{c_n}$

with coefficients $a_i, b_ic, c_i$ and an overall phase factor $\mu \in \{ \pm 1, \pm i \}$ (we can even assume that all the $b_i$ are equal to one as $\sigma_x \sigma_z$ is a multiple of $\sigma_y$). Any two elements of the Pauli group either commute or anti-commute. Within this group, we now consider a finite set $\{ M_i \}$ of commuting elements and the subgroup S of the Pauli group generated by this set. This group (which is abelian as it is generated by mutually commuting elements) is called the stabilizer group. To this group, we can associate the subspace that consists of all vectors that are fixed by all elements of the group, i.e. the space

$T_S = \{ |\psi \rangle \, \text{such that} \, s|\psi \rangle = |\psi \rangle \forall s \in S \}$

This space will be the code space of our code. If the group S is generated by l independent generators, the dimension of the code space can be seen to be 2n-l.

In the example of the three bit code, we had n=3 and l=2, which gives us a two-dimensional code space, corresponding to one logical qubit (if you try work out the details, you will see that in order for this to be true, we have to assume that $-1 \notin S$ – you might want to take a look at my notes or the chapters on quantum error correction in [1] for the mathematical details).

Which errors is our code able to detect? Suppose that E is an error operator that is itself in the Pauli group. We know that any two elements of the Pauli group either commute or anti-commute. Let us assume that E does in fact anti-commute with some element s of S. If $|\psi \rangle$ is a state in the code space, we can then calculate

$s E |\psi \rangle = - E s |\psi \rangle = - E |\psi \rangle$

Therefore the state $E |\psi \rangle$ is now in the -1 eigenspace of s. Thus if we measure all elements of S, the outcome -1 for one of the measurements will tell us that an error has occurred. A similar argument shows that if the product of any two errors anti-commutes with at least one element of S, then we can also correct the error based on only the measurement results of elements in S. Mathematically speaking, the set of all elements of the Pauli group that commute with all elements of S is the centralizer of S, which in this case turns out to be equal to the normalizer N(S) of S, and a set $\{E_\alpha \}$ of errors can be detected and corrected if and only if

$E_\alpha E_\beta \in S \cup (\mathcal{G} - N(S))$

for any two error operators $E_\alpha, E_\beta$. So the elements of the Pauli group that are neither in S nor in N(S) correspond to correctable errors.

What do the elements of the normalizer correspond to? Suppose that n is an element in the normalizer and therefore commutes with all elements of S. Then, for any element s in S and any element $|\psi \rangle$ of the code space, we have

$s n |\psi \rangle = n s |\psi \rangle = n |\psi \rangle$

Consequently, the element $n |\psi \rangle$ is again fixed by all elements of S and therefore in the code space TS. Thus the elements of N(S) generate automorphisms of the code space, i.e. logical operations. It is one of the strengths of the stabilizer formalism that once we have the stabilizer group S, we can not only derive the code space and the correctable errors, but can also use group theoretic considerations to describe logical operations – this will become clearer as we study surface codes in a later post.

For later reference, let us briefly summarize what we have found. Given l independent generators si of the stabilizer S, we can define a code space as the set of all vectors that are fixed by the si. This subspace will encode n – l logical qubits in n physical qubits. To detect an error, we measure all the si. Each measurement will give us minus one or plus one. Any occurrence of minus one indicates an error (this is why the combined result of all measurements is usually called the error syndrome) and will also tell us which operation we have to apply to correct the error. To implement logical quantum gates on our code space, we can apply elements in the normalizer N(S) that map the code space into itself. Thus we have expressed error detection, error correction and logical operations entirely in terms of the stabilizer group and the language of group theory.

But how do we find useful stabilizer codes? One approach is given by the check matrix formalism, which allows us to express stabilizer codes in terms matrices and metrics, i.e. in terms of linear algebra. This approach is generalizing the parity check matrix from the classical theory of error correction. A second source of stabilizers, however, comes from a more surprising direction – algebraic topology. In fact, given a surface described in terms of a cell complex, we can associate elements of the Pauli group with every cycle and every co-cycle. For certain choices of cycles and cocycles, this will give us abelian subgroups of the Pauli group which in turn create error correction codes. This is how geometrical constraints in the actual implementation enter the scene and leads to a class of codes known as surface codes and toric codes that we will start to study in my next post.

1. E. Rieffel, W. Polak, Quantum computing – a gentle introduction, MIT Press, Cambridge 2011
2. D. Gottesman, Stabilizer Codes and Quantum Error Correction, Caltech Ph.D. Thesis, available as arXiv:quant-ph/9705052
3. J. Kempe, Approaches to quantum error correction, S\’eminaire Poincar\’e 1 (2005), pp. 65–93, available as arXiv:quant-ph/0612185

Basics of quantum error correction

Do usable universal quantum computers exist today? If you follow the recent press releases, you might believe that the answer is “yes”, with IBM announcing a 50 qubit quantum computer and Google promoting its Bristlecone architecture with up to 72 qubits. Unfortunately, the world is more complicated than this – time to demystify the hype a bit.

The need for error correction

The important point is that it is not just the number of qubits that matters, but also their quality. When we study quantum algorithms like Shor’s algorithm, we are working with idealized qubits that behave exactly the way an isolated two-state quantum system is supposed to behave – these idealized qubits are often called logical qubits. However, in a real world implementation, there is no fully isolated two-state quantum system. Every system interacts to some extent with the environment, an interaction that we can try to reduce to a minimum, for instance by cooling our device down to very low temperatures, but never fully avoid. A trapped ion could, for instance, interact with radiation entering our device from the environment, and suddenly is part of a larger quantum system, consisting of the ion, the photons making up the radiation and maybe even the source of the photon. This will introduce errors into our system, i.e. deviations of the behavior of the system from the idealized theoretical model.

In addition to unwanted interactions with the environment, other errors could creep into our computation. When manipulating qubits to realize gates, we might make mistakes, for instance by directing a microwave pulse with a slightly incorrect frequency at our qubit, and we can make mistakes during each measurement.

Thus the real qubits in a quantum computer – called physical qubits – are prone to errors. These errors might be small for one qubit, but they tend to propagate through the circuit and add up to a significant error that will render the result of our quantum computation unusable. Thus we need error correction, i.e. the ability to detect and correct errors during our computation.

So how would you do this? Of course, errors can also occur in classical systems, and there are well developed methods to detect and correct them. Unfortunately, these approaches typically rely on the ability to copy and measure individual bits. This is easy for a classical bit, but more complicated for a qubit, as a measurement will collapse our system into an eigenstate of the observed operator and thus interfere with quantum algorithm.

The good news is that quantum error correction is still possible. In this post, I will try to explain the basics, before we then dive into more advanced topics in the next few posts.

Encoding logical states

In order to understand the basic ideas and structures behind quantum error correction, it is useful to study a simplified example – the three qubit code. To introduce this code, let us suppose that we have access to a communication channel across which we can send individual qubits from one quantum device to another one. Suppose further that this transmission is not perfect, but is subject to a bit flip error with a probability p. Thus, with probability p, a one-qubit state $|\psi \rangle$ will be changed to $X |\psi \rangle$ during transmission, where $X$ is the usual bit flip operator, and with probability 1-p, the transmission does not change the state. The aim is to construct an encoding of a qubit such that these errors can be detected and corrected.

To achieve this, we encode every single qubit state in a three-qubit state before transmitting it. Thus we use the following encoding

$|0 \rangle \mapsto |000 \rangle$

$|1 \rangle \mapsto |111 \rangle$

It is not difficult to see that this encoding can in fact be realized by a unitary circuit – the circuit below will do the trick.

To transmit one qubit, we use this encoding to obtain a three-qubit message. We then send those three qubits through our communication channel. After the transmission, we apply a procedure known as syndrome measurement, using the following circuit.

Let us see what this circuit is doing. First, let us suppose that the original qubit was $|0 \rangle$, encoded as $|000 \rangle$, and no error did occur during the transmission. Then the three qubits at the top of the circuit will still be $|000 \rangle$. In this case, the CNOT gates act as the identity, and the overall state after passing the circuit is

$|000 \rangle |00 \rangle$

Similarly, if the original qubit is $|1 \rangle$ and no error occurred, all CNOT gates will act as inversion. Thus the ancilla qubits will be inverted twice, and we end up with the state

$|111 \rangle |00 \rangle$

The situation is a bit different if a bit flip error has affected one of the qubits during the transmission, say the first one. Suppose the original state was again $|0 \rangle$. After encoding and transmission with a bit flip on the first qubit, we will receive the state $|100 \rangle$. Therefore both ancilla qubits will be inverted, and we obtain the state

$|100 \rangle |11 \rangle$

Let us now generalize these considerations. Assume that we are encoding the state $a |0 \rangle + b |1 \rangle$, so that the encoded state will be $a |000 \rangle + b |111 \rangle$. When we go through the above exercise for all possible cases, we arrive at the following table that shows the transmission error and the resulting state after passing the syndrome measurement circuit.

Error Resulting state
No error $(a |000 \rangle + b |111 \rangle ) |00 \rangle$
Bit flip on first qubit $(a |100 \rangle + b |011 \rangle ) |11 \rangle$
Bit flip on second qubit $(a |010\rangle + b |101 \rangle ) |10 \rangle$
Bit flip on third qubit $(a |001\rangle + b |110 \rangle ) |01 \rangle$

Now let us see what happens if we measure the ancilla qubits. First, note that all the states are already eigenstates for the corresponding measurement operator. Thus measuring the ancilla qubits will not change the state of the first three qubits and it will not reveal any information on the encoded state. The second important observation is that the value of the ancilla qubits tells us the exact error that has occurred. Thus we have found a way to not only find out that an error has occurred without destroying our superposition, but also to figure out which qubit was flipped. Given that information, we can now apply a bit flip operator once more to the affected qubit to correct the error. Again, this will not reveal the values of a and b and not collapse our state, and we can therefore continue to work with the encoded quantum state, for instance by running it through the inverse of the encoding circuit to get our original state back.

More general error models

So what we have found so far is that it is possible, also in the quantum world, to detect and protect against pure bit flip errors without destroying the superposition. But there is more we can learn from that example. In fact, let us revisit our original assumption that the only thing that can go wrong is that the operator X is applied to some of the qubits and allow a more general operator. Suppose, for instance, that our error is represented by applying to at most one of our qubits the operator

$E = (1-\epsilon) 1 + \epsilon X$

for a small $\epsilon$. In contrast to our earlier assumption that the error operator is discrete, i.e. is either applied or not applied, this operator is now continuous, depending on the parameter $\epsilon$, i.e. it looks as if we had to deal with a full continuous spectrum of errors. A short calculation shows that after transmitting and applying the syndrome measurement circuit above, the state of our quantum system will now be

$(1-\epsilon)(\alpha |000 \rangle + \beta |111 \rangle)|00\rangle + \epsilon (\alpha |100 \rangle + \beta |011\rangle) |11 \rangle$

Now let us again apply a measurement of the ancilla qubits. Then, according to the laws of quantum mechanics, the system will collapse onto an eigenstate, i.e. it will – up to normalization – end up in one of the states

$(\alpha |000 \rangle + \beta |111 \rangle)|00\rangle$

and

$(\alpha |100 \rangle + \beta |011\rangle) |11 \rangle$

But these are exactly the states in which we end up if no error occurs or a single bit flip error occurs. Thus, our measurement forces the system to somehow decide whether an error occurred or not and if yes, which error occurred – we are opening the box in which SchrÃ¶dingers cat is hidden. This is a very important observation often referred to as the digitization of errors – it suffices to protect against discrete errors as the syndrome measurement will collapse any superposition of different errors states.

So far we have worked with a code which is able to protect against a bit flip error. But of course this is not the only type of error that can occur. At the first glance, it looks like there is a vast universe of potential errors that we have to account for, as in theory, the error could be any unitary operator. However, using arguments similar to the discussion in the last paragraph, one can show that it suffices to protect against two types of errors: the bit flip error discussed above and the phase flip error, represented by the matrix

$Z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}$

Note that the phase flip error has no classical equivalent, other than the bit flip error which can classically be interpreted as a bit flipping from one to zero or vice versa randomly. The first error correction code that was able to handle both, bit flip errors and phase flip errors (and combinations thereof, ), and therefore any possible type of error (as long as the number of errors is limited) was described in 1995 by P. Shor. This code uses nine qubits to encode one logical qubit and is somehow a repeated application of the three bit code, based on the observation that the Hadamard transform turns a bit flip error into a phase flip error and vice versa. Later, different types of codes were discovered that are also universal in the sense that they protect against any potential one qubit error, i.e. against any error that only affects one qubit.

Let us now look at the structure that these codes have in common. First, the encoding can be described as identifying a 2-dimensional subspace C, called the code space, in the larger space spanned by all qubits. In the case of the three qubit code, the code space is spanned by the states $|000 \rangle$ and $|111 \rangle$, corresponding to a logical zero and a logical one. More generally, the space C can have dimension 2n and the full Hilbert space can have dimension 2k, in which case we can encode n qubits in a larger set of k qubits (in the nine qubit code example, k = 9 and n = 1).

Hence the code space is spanned by a set of 2n basis vectors $|\psi_i \rangle$ that we call code words. In the case n = 1, it is common practice to denote the codewords by $|0 \rangle_L$ and $|1 \rangle_L$ to indicate that they represent a logical qubit encoded using k physical qubits.

In addition, there is a set of error operators, i.e. a finite set of operators $\{ E_\alpha\}$ like the phase flip or bit flip operators acting on the larger Hilbert space. These operators represent the impact of discretized noise and will generally move the code words out of the code space, i.e. they will rotate the code space onto different subspaces of the entire Hilbert space.

In order for a code to be useful, we of course need a relation between code space and errors that tells us that the errors can be detected and corrected. What are these conditions? A first condition is that no matter which errors occur, we can still tell the code words apart. This is guaranteed if the various error operators map different code words onto mutually orthogonal subspaces, in other words if

$\langle \psi_i | E_\alpha^{+} E_\beta | \psi_j \rangle = 0$

whenever $i \neq j$ and for all $\alpha, \beta$. Thus, even in the presence of errors, the different code words will never overlap.

What about the case that both code words are the same? It is tempting to ask for the condition

$\langle \psi_i | E_\alpha^{+} E_\beta | \psi_i \rangle = \delta_{\alpha \beta}$

i.e. to require that different errors map the same codeword to orthogonal subspaces. This would make recovery very easy. We could perform the measurements that correspond to projections onto these subspaces to detect the error and correct them by applying the inverse of the error operator. However, this condition is too restrictive. In the case of the nine qubit code, for example, it might very well happen that two different errors map a code word to the same state. However, the same applies for the correction, i.e. we do not have to distinguish between these two errors as the act similarly on the code space. Therefore, a more general condition is usually used which captures this case:

$\langle \psi_i | E_\alpha^{+}E_\beta | \psi_j \rangle = C_{\alpha\beta} \delta_{ij}$

for a hermitian matrix $C_{\alpha \beta}$. If the matrix has full rank, the code is called non-degenerate, otherwise – as in the case of the nine qubit code – the code is called degenerate.

Of course this encoding generates some overhead. To represent one logical qubit, we need more than one physical qubit. For the Shor code, we have to use nine physical qubits to encode one logical qubit. It is natural to ask what the minimum overhead is that we need. In 1996, Steane discovered a code that requires only seven physical qubits. In the same year, a code that requires only five qubits was presented and it was shown that this is a lower bound, i.e. there is no error correction code that requires less than five qubits.

So there is an unavoidable overhead – and the situation is even worse. To implement error correction, you need again quantum gates, which can of course also experience errors. Thus you need additional circuitry to protect the error correction against errors, which can again introduce errors and so forth. That there is a way out of this vicious circle is not obvious and the content of the famous threshold theorem that we will study in a later post – but even this way out is very hard to implement and might require thousands of physical qubits to implement one single logical qubit.

So even with a 72 qubit device, we are still far away from implementing only one logical qubit – and having a few thousand logical qubits to use Short’s algorithm to break an RSA key with a realistic key length is yet another story. So it is probably a good idea to take claims about universal supremacy of quantum computing within a few years with a grain of salt.

In this post, we have looked at some of the essential ideas behind quantum error correction, i.e. the ability to detect and correct errors. However, this is not enough to build a reliable quantum computer – after all, adding an error correction circuit introduces additional qubits that can also create new errors. In addition, we need to be able to perform calculations on our encoded states. So there is more that we need for fault-tolerant quantum computing which is the topic of the next post.

Into the quantum lab – first steps with IBMs Q experience

Even though physical implementations of quantum computers make considerable progress, it is not likely that you will have one of them under your desk in the next couple of years. Fortunately, some firms like IBM and Rigetti have decided to make some of their quantum devices available only so that you can play with them. In this post, we will have a first look at IBMs cloud offering – the IBM Q experience.

IBM offers different ways to interact with their quantum computers. The easiest access is provided by the interactive composer which allows you to create a quantum circuit online and to run it on a 5 qubit quantum computer based on superconducting qubits after registering.

The screenshot below shows the editor screen. At the top, you find an overview of the available quantum devices, both realising five (physical) qubits, and some of the main characteristics like error rates and decoherence times.

In the lower area of the screen, you can compose quantum circuits graphically. You can add quantum gates that manipulate the five qubits q[0] – q[4] and measurements. After setting up your circuit, you can either simulate the circuit or you can run it on the actual device. It is also possible to export the circuit as a QASM code snippet and to save experiments and results as so-called quantum scores.

Let us now run this circuit on the actual quantum hardware. Doing this consumes units, i.e. credits. A standard run consists of 1024 repetitions and one measurement after each repetition and consumes three units (which are replenished after the execution has completed). The execution is queued and, typically after a few minutes, the results are made available online (and you receive a mail informing you about the execution). When running a circuit, the platform also checks whether that circuit was executed before (by you our someone else) and if yes offers you immediate access to the previous results without consuming any units. The screenshot below shows the result of one of these runs.

The interesting part is the diagram at the top that shows the measurement outcomes in the computational basis, i.e. for each member of the computational basis on the x-axis, the value shows the percentage of measurements having that basis element as result. In this example, we see that almost always, the result is 00000, as expected from the circuit in which we do a double inversion of the first qubit. However, in some cases – corresponding to 1,3 percent of all runs – the outcome is 00001. This is a nice example for an error that occurs whenever we switch from a simulator to a real physical device. Note that in the IBM Q experience, the qubit q[0] is the least significant qubit and the rightmost qubit in the notation of the computational basis.

Now let us try something else – we want to put our system into an equal superposition

$\sum_i |i \rangle$

which is the usual starting point for most quantum algorithms. We know that we can achieve this by applying a Hadamard gate on each qubit. The following screenshot shows the resulting circuit for three qubits and the results of a (cached) run.

Â

As expected, we see that our measurement results are spread across eight (23) vectors of the computational basis. However, we also see that again, the result does not exactly match the theoretical prediction – the values are not exactly equally distributed but we see slight deviations as before.

Finally, let us try out a multi-qubit gate. The screenshot below shows the result of a run with two CNOT gates. The first CNOT gate receives a logical one on its control bit, the second CNOT gate a logical zero. Thus, the result should be 00011, i.e. the first (least significant) qubits are inverted. In reality, we again get this result plus some noise, represented by unexpected outcomes in the measurements.

Playing with the composer is fun, and makes it easy to create simple circuits from scratch. However, more complicated circuits soon become difficult to define in this way. For those circuits, the IBM platform offers an API that can be used to run experiments using Python scripts that can much easier be developed and debugged step by step. We will look at this option in a later post. Until then, I recommend you get an account and create some circuits yourself – happy hacking!

Shor’s quantum factoring algorithm

Until the nineties of the last century, quantum computing seemed to be an interesting theoretical possibility, but it was far from clear whether it could be useful to tackle computationally hard problems with high relevance for actual complications. This changed dramatically in 1994, when the mathematician P. Shor announced a quantum algorithm that could efficiently solve one of the most intriguing problems in applied mathematics – factoring large numbers into their constituent primes, which, for instance, can be used to break commonly used public key cryptography schemes like RSA.

Shor’s algorithm is significantly more complicated than the quantum algorithms that we have studied so far, so we start with a short overview and then look at the individual pieces in more detail.

Given a number M that we want to factorize, the first part of Shor’s algorithm is to find a number x which has no common divisor with M so that it is a unit modulo M. In practice, we can just guess some x and compute the greatest common divisor gcd(x,M) – if this is not one, we have found a factor of M and are done, if this is one, we have the number x that we need. This step can still be done efficiently on a classical computer and does not require a quantum computer.

The next part of the algorithm now uses a quantum algorithm to determine the period of x. The period is the smallest non-zero number r such that

$x^r \equiv 1 \mod M$

The core of this step is the quantum algorithm that we will study below. However, the quantum algorithm does not exactly return the number r, but it returns a number s which is close to a multiple of $\frac{N}{r}$, where N is a power of two. Getting r out of this information is again a classical step that uses the theory of continued fractions. The number N that appears here is N = 2n where n is the number of qubits that the quantum part requires, and needs to be chosen such that M2 can be represented with n bits.

Finally, the third part of the algorithm uses the period r to find a factor of M, which is again done classically using elementary number theory. Thus the overall layout of the algorithm is as follows.

• Find the smallest number n (the number of qubits that we will need) such that $M^2 \leq N = 2^n$ and a number $x < M$ such that gcd(x,M) = 1
• Use the quantum part of the algorithm to find a number s which is approximately an integer multiple of N / r
• Use the theory of continued fractions to extract the period r from this information
• Use the period to find a factor of M

We will know look at each of these steps in turn (yes, this is going to be a bit of a lengthy post). To make this more tangible, we use a real example and assume that we wanted to factor the number M = 21. This is of course a toy example, but it allows us to simulate and visualize the procedure efficiently on a classical computer.

Determine n and x

The first step is easy. First, we need to determine the number n of qubits that we need. As mentioned above, this is simply the bit length of M2. In our case, M2 = 441, and the next power of two is 512 = 29, so we need n = 9 qubits.

The next step is to find the number x. This can easily be done by just randomly picking some x and checking that is has no common prime factor with M. In our example, let us choose x = 11 (which is a prime number, but this is just by accident and not needed). It is important to choose this rather randomly, as the algorithm might fail in some rare instances and we need to start over, but this only makes sense if we do not pick the same choice again for our second trial.

The quantum part of the algorithm

Now the quantum part of the algorithm starts. We want to calculate the period of x = 11, i.e. the smallest number r such that xr – 1 is a multiple of M = 21.

Of course, as our numbers are small, we could easily calculate the period classically by taking successive powers of 11 and reducing modulo 21, and this would quickly tell us that the period is 6. This, however, is no longer feasible with larger numbers, and this is where our quantum algorithm comes into play.

The algorithm uses two quantum registers. The first register has n qubits, the second can have less, in fact any number of qubits will do as long as we can store all numbers up to M in it, i.e. the bit length of M will suffice. Initially, we bring the system into the superposition

$\frac{1}{\sqrt{N}} \sum_k |k \rangle |0 \rangle$

which we can for instance do by starting with the state with all qubits being zero and then applying the Hadamard-Walsh transformation to the first register.

Next, we consider the function f that maps a number k to xk modulo M. As for every classical function, we can again find a quantum circuit Uf that represents this function on the level of qubits and apply it to our state to obtain the state

$\frac{1}{\sqrt{N}} \sum_k |k \rangle |x^k \mod M \rangle$

In his original paper [2], Shor calls this part the modular exponentiation and shows that this is actually the part of the quantum algorithm where most gates are needed (not the quantum Fourier transform).

This state has already some periodicity built into it, as xk modulo M is periodic with period r. If we could measure all the amplitudes, we could easily determine r. However, every such measurement destroys the quantum state and we have to start again, so this algorithm will not be very efficient. So again, the measurement is an issue.

Now, Shor’s idea is to solve the measurement issue by first applying (the inverse of) a quantum Fourier transform to the first register and then measure the first register (we apply the inverse of the quantum Fourier transform while other sources will state that the algorithm uses the quantum Fourier transform itself, but this is just a matter of convention as to which transformation you call the Fourier transform). The outcome s of this measurement will then give us the period!

To get an idea why this is true, let us look at a simpler case. Assume that, before applying the quantum Fourier transform, we measure the value of the second register. Let us call this value y. Then, we can write y as a power of x modulo M. Let k0 be the smallest exponent such that

$x^{k_0} = y$

Then, due to the periodicity, all values of k such that xk = y modulo M are given by

$k = k_0 + jr$

Here the index j needs to be chosen such that k0 + jr is still smaller than M. Let A denote the number of possible choices for j. Then, after the measurement, our state will have collapsed to

$\frac{1}{\sqrt{A}} \sum_{j=0}^{A-1} |k_0 + jr \rangle |y \rangle$

Let us now apply the inverse of the quantum Fourier transform to this state. The result will be the state

$\frac{1}{\sqrt{AN}} \sum_{j=0}^{A-1} \sum_{s=0}^{N-1} \eta^{(x_0 + jr)s} |s \rangle$

Now let us measure the first register. From the expression above, we can read off the probability P(s) to measure a certain value of s – we just have to add up the squares of all amplitudes with this value of s. This gives us

$P(s) = \frac{1}{AN} \big| \sum_{j=0}^{A-1} \eta^{jrs} \big|^2$

This looks complicated, but in fact this is again a geometric series with coefficient $q = \eta^{rs}$. To see how the value of the series depends on s, let us assume for a moment that the period divides N (which is very unlikely in practice as N is a power of two, but let us assume this anyway just for the sake of argument), i.e. that N = r u with the frequency u being an integer. Thus, if s is a multiple of u, the coefficient q is equal to one (as $\eta^N = 1$) and the geometric series sums up to A, giving probability 1 / N to measure this value. If, however, s is not a multiple of u, the value of the geometric series is

$\frac{1 - q^A}{1 - q}$

But in our case, A is of course simply equal to u, and therefore qA is equal to one. Thus the amplitude is zero! We find – note the similarity to our analysis of the Fourier transform of a periodic sequence – that P(s) is sharply peaked at multiples of $u = \frac{N}{r}$!

We were able to derive this result using a few simplifications – an additional measurement and the assumption that the frequency is an integer. However, as carried out by Shor in [2], a careful analysis shows that these assumptions are not needed. In fact, one can show (if you want to see all the nitty-gritty details, you could look at Shor’s paper or at my notes on GitHub that are based on an argument that I have seen first in Preskill’s lecture notes) that with reasonably high probability, the result s of the measurement will be such that

$\big| \{sr\}_N \big| \leq \frac{r}{2}$

where $\{sr\}_N$ denotes the residual of sr modulo N. Intuitively, this means that with high probability, the residual is very small, i.e. rs is close to a multiple of N, i.e. s is close to a multiple of N / r. In other words, it shows that in fact, P(s) has peaks at multiples of N / r.

The diagram below plots the probability distribution P(s) for our example, i.e. N = 512 and r = 6 (this plot has been generated using the demo Shor.py available in my GitHub account which uses the numpy package to simulate a run of Shor’s algorithm on a classical computer)

As expected, we see sharp peaks, located roughly at multiples of 512 / 6 = 85.33. So when we measure the first register, the value s will be close to a multiple of 512 / 6 with a very high probability.

So the quantum algorithm can be summarized as follows.

• Prepare a superposition $\frac{1}{\sqrt{N}} \sum_k |k \rangle |x^k \mod M \rangle$
• Apply the (inverse of the) quantum Fourier transform to this state
• Measure the value of the first register and call the result s

When running the simulation during which the diagram above was created, I did in fact get a measurement at s = 427 which is very close to 5*512 / 6.

Extracting the period

So having our measurement s = 427 in our hands, how can we use this to determine the period r? We know from the considerations above that s is close to a multiple of N / r, i.e. we know that there is an integer d such that

$\big| sr - dN \big| \leq \frac{r}{2}$

which we can rewrite as

$\big| \frac{s}{N} - \frac{d}{r} \big| \leq \frac{1}{2N} < \frac{1}{M^2}$

Thus we are given two rational numbers – s / N and d / r – which we know to be very close to each other. We have the first number s / N in our hands and want to determine the second number. We also know that the denominator r of the second number is smaller than M. Is this sufficient to determine d and r?

Surprisingly, the answer is “yes”. We will not go into details at this point and gloss over some of the number theory, but refer the reader to the classical reference [1] or to my notes for more details). The first good news is that two different fractions with denominators less than M need to be at least by 1 / M2 apart, so the number d / r is unique. The situation is indicated in the diagram below.

But how to find it? Luckily, the theory of continued fractions comes to the rescue. If you are not familiar with continued fractions, you can find out more in the appendix of my notes or on the very good Wikipedia page on this. Here, we will just go through the general procedure using our example at hand.

First, we write

$\cfrac{427}{512} = 0 + \cfrac{1}{\cfrac{512}{427}} = 0 + \cfrac{1}{1 + \cfrac{85}{427}}$

We can do the same with 85 / 427, i.e. we can write

$\cfrac{85}{427} = \cfrac{1}{\cfrac{427}{85}} = \cfrac{1}{5 + \cfrac{2}{85}}$

which will give us the decomposition

$\cfrac{427}{512} = 0 + \cfrac{1}{1 + \cfrac{1}{5 + \cfrac{2}{85}}}$

Driving this one step further, we finally obtain

$\cfrac{427}{512} = 0 + \cfrac{1}{1 + \cfrac{1}{5 + \cfrac{1}{42 + \cfrac{1}{2}}}}$

This is called the continued fraction expansion of the rational number 427 / 512. More generally, for every sequence $[a_0 ; a_1, a_2, \dots ]$, we can form the continued fraction

$a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \dots}}$

given by that sequence of coefficients, which is obviously a rational number. One can show that in fact every rational number has a representation as a continued fraction, and our calculation has shown that

$\cfrac{427}{512} = [0; 1,5,42,2]$

This sequence has five coefficients. Now given a number m, we can of course look at the sequence that we obtain by looking at the cofficients up to index m only. For instance, for m = 3, this would give us the sequence

$[0; 1,5,42]$

The rational number represented by this sequence is

$0 + \cfrac{1}{1 + \cfrac{1}{5 + \cfrac{1}{42}}} = \cfrac{211}{253}$

and is called the m-th convergent of the original continued fraction. We have such a convergent for every m, and thus get a (finite) series of rational numbers with the last one being the original number.

One can now show that given any rational number x, the series of m-th convergents of its continued fraction expansion has the following properties.

• The convergents are in their lowest terms
• With increasing m, the difference between x and the m-th convergent gets smaller and smaller, i.e. the convergents form an approximation of x that gets better and better
• The denominators of the convergents are increasing

So the convergents can be used to approximate the rational number x by fractions with smaller denominator – and this is exactly what we need: we wish to approximate the rational number s / N by a ratio d / r with smaller denominator which we then know to be the period. Thus we need to look at the convergents of 427 / 512. These can be easily calculated and turn out to be

$0, 1, \cfrac{5}{6}, \cfrac{211}{253}, \cfrac{427}{512}$

The last convergent whose denominator is still smaller than M = 21 is 5 / 6, and thus we obtain r = 6. This is the period that we are looking for!

So in general, the recipe to get from the measured value s to r is to calculate the convergents of the rational number s / N and pick the denominator of the last convergent that has a denominator less than M. Again, if you want to see the exact algorithm, you can take a look at my script Shor.py.

Find the factor

We are almost done. We have run the quantum algorithm to obtain an approximate multiple of N / r. We have then applied the theory of continued fractions to derive the period r of x from this measurement. The last step – which is again a purely classical step – is now to use this to find a factor of M. This is in fact comparatively easy.

Recall that – by definition of the period – we get one if we raise x to the power of r and than reduce module M. In other words, xr minus one is a multiple of M. Now assume that we are lucky and the period r is even. Then

$(x^{\frac{r}{2}} - 1)(x^{\frac{r}{2}} + 1) = (x^r - 1) \equiv 0 \mod M$

With a bit of elementary number theory, one can now show that this implies that the greatest common divisor gcd(xr/2-1, M) is a factor of M (unless, in fact, xr/2 is minus one modulo M, in which case the argument fails). So to get from the period to a potential factor of M, we simply calculate this greatest common divisor and check whether it divides M!

Let us do this for our case. Our period is r = 6. With x = 11, we have x3 = 1331, which is 8 module M. Thus

$\text{gcd}(x^{\frac{r}{2}} - 1 \mod M, M) = \text{gcd}(7,21) = 7$

which is the factor of M = 21 that we were looking for.

Performance of the algorithm

In our derivation, we have ignored a few special cases which can make the algorithm fail. For instance, suppose we had not measured s = 427, but s = 341 after applying the Fourier transform. Then the corresponding approximation to 341 / 512 would have been 4 / 6. However, the continued fraction algorithm always produces a result that is in its lowest terms, i.e. it would give us not 4 / 6, but 2 / 3. Looking at this, we would infer that r = 3, which is not the correct result.

There are a few other things that can go wrong. For instance, we could find a period r which is odd, so that our step to derive a factor of M from r does not work, or we might measure an unlikely value of s.

In all these cases, we need to start over and repeat the algorithm. Fortunately, Shor could show that the probability that any of this happens is bounded from below. This bound is decreasing with larger values of M, but it is decreasing so slowly that the expected number of trials that we need grows at most logarithmically and does not destroy the overall performance of the algorithm.

Taking all these considerations into account and deriving bounds for the number of gates required to perform the quantum part of the algorithm, Shor was able to show that the number of steps to obtain a result grows at most polynomially with the number of bits that the number M has. This is obviously much better than the best classical algorithm that requires a bit less than exponential time to factor M. Thus, assuming that we are able to build a working quantum computer with the required number of gates and qubits, this algorithm would be able to factorize large numbers exponentially faster than any known classical algorithm.

Shor’s algorithm provides an example for a problem that is believed to be in the class NP (but not in P) on a classical computer, but in the class BQP on a quantum computer – this is the class of all problems that can be solved in polynomial time with a finite probability of success. However, even though factorization is generally believed not to be in P, i.e. not doable in polynomial time on classical hardware, there is not proof for that. And, even more important, it is not proved that factorization is NP-complete. Thus, Shor’s algorithm does not render every problem in NP solvable in polynomial time on a quantum computer. It does, however, still imply that all public key cryptography systems like RSA that rely on the assumption that large numbers are difficult to factor become inherently insecure once a large scale reliable quantum computer becomes available.

References

1. G.H. Hardy, E.M. Wright, An introduction to the theory of numbers, Oxford University Press, Oxford, 1975
2. P. Shor, Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer, SIAM J.Sci.Statist.Comput. Vol. 26 Issue 5 (1997), pp 1484–1509, available as arXiv:quant-ph/9508027v2