Shor’s quantum factoring algorithm

Until the nineties of the last century, quantum computing seemed to be an interesting theoretical possibility, but it was far from clear whether it could be useful to tackle computationally hard problems with high relevance for actual complications. This changed dramatically in 1994, when the mathematician P. Shor announced a quantum algorithm that could efficiently solve one of the most intriguing problems in applied mathematics – factoring large numbers into their constituent primes, which, for instance, can be used to break commonly used public key cryptography schemes like RSA.

Shor’s algorithm is significantly more complicated than the quantum algorithms that we have studied so far, so we start with a short overview and then look at the individual pieces in more detail.

Given a number M that we want to factorize, the first part of Shor’s algorithm is to find a number x which has no common divisor with M so that it is a unit modulo M. In practice, we can just guess some x and compute the greatest common divisor gcd(x,M) – if this is not one, we have found a factor of M and are done, if this is one, we have the number x that we need. This step can still be done efficiently on a classical computer and does not require a quantum computer.

The next part of the algorithm now uses a quantum algorithm to determine the period of x. The period is the smallest non-zero number r such that

x^r \equiv 1 \mod M

The core of this step is the quantum algorithm that we will study below. However, the quantum algorithm does not exactly return the number r, but it returns a number s which is close to a multiple of \frac{N}{r}, where N is a power of two. Getting r out of this information is again a classical step that uses the theory of continued fractions. The number N that appears here is N = 2n where n is the number of qubits that the quantum part requires, and needs to be chosen such that M2 can be represented with n bits.

Finally, the third part of the algorithm uses the period r to find a factor of M, which is again done classically using elementary number theory. Thus the overall layout of the algorithm is as follows.

  • Find the smallest number n (the number of qubits that we will need) such that M^2  \leq N = 2^n and a number x < M such that gcd(x,M) = 1
  • Use the quantum part of the algorithm to find a number s which is approximately an integer multiple of N / r
  • Use the theory of continued fractions to extract the period r from this information
  • Use the period to find a factor of M

We will know look at each of these steps in turn (yes, this is going to be a bit of a lengthy post). To make this more tangible, we use a real example and assume that we wanted to factor the number M = 21. This is of course a toy example, but it allows us to simulate and visualize the procedure efficiently on a classical computer.

Determine n and x

The first step is easy. First, we need to determine the number n of qubits that we need. As mentioned above, this is simply the bit length of M2. In our case, M2 = 441, and the next power of two is 512 = 29, so we need n = 9 qubits.

The next step is to find the number x. This can easily be done by just randomly picking some x and checking that is has no common prime factor with M. In our example, let us choose x = 11 (which is a prime number, but this is just by accident and not needed). It is important to choose this rather randomly, as the algorithm might fail in some rare instances and we need to start over, but this only makes sense if we do not pick the same choice again for our second trial.

The quantum part of the algorithm

Now the quantum part of the algorithm starts. We want to calculate the period of x = 11, i.e. the smallest number r such that xr – 1 is a multiple of M = 21.

Of course, as our numbers are small, we could easily calculate the period classically by taking successive powers of 11 and reducing modulo 21, and this would quickly tell us that the period is 6. This, however, is no longer feasible with larger numbers, and this is where our quantum algorithm comes into play.

The algorithm uses two quantum registers. The first register has n qubits, the second can have less, in fact any number of qubits will do as long as we can store all numbers up to M in it, i.e. the bit length of M will suffice. Initially, we bring the system into the superposition

\frac{1}{\sqrt{N}} \sum_k |k \rangle |0 \rangle

which we can for instance do by starting with the state with all qubits being zero and then applying the Hadamard-Walsh transformation to the first register.

Next, we consider the function f that maps a number k to xk modulo M. As for every classical function, we can again find a quantum circuit Uf that represents this function on the level of qubits and apply it to our state to obtain the state

\frac{1}{\sqrt{N}} \sum_k |k \rangle |x^k \mod M \rangle

In his original paper [2], Shor calls this part the modular exponentiation and shows that this is actually the part of the quantum algorithm where most gates are needed (not the quantum Fourier transform).

This state has already some periodicity built into it, as xk modulo M is periodic with period r. If we could measure all the amplitudes, we could easily determine r. However, every such measurement destroys the quantum state and we have to start again, so this algorithm will not be very efficient. So again, the measurement is an issue.

Now, Shor’s idea is to solve the measurement issue by first applying (the inverse of) a quantum Fourier transform to the first register and then measure the first register (we apply the inverse of the quantum Fourier transform while other sources will state that the algorithm uses the quantum Fourier transform itself, but this is just a matter of convention as to which transformation you call the Fourier transform). The outcome s of this measurement will then give us the period!

To get an idea why this is true, let us look at a simpler case. Assume that, before applying the quantum Fourier transform, we measure the value of the second register. Let us call this value y. Then, we can write y as a power of x modulo M. Let k0 be the smallest exponent such that

x^{k_0} = y

Then, due to the periodicity, all values of k such that xk = y modulo M are given by

k = k_0 + jr

Here the index j needs to be chosen such that k0 + jr is still smaller than M. Let A denote the number of possible choices for j. Then, after the measurement, our state will have collapsed to

\frac{1}{\sqrt{A}} \sum_{j=0}^{A-1} |k_0 + jr \rangle  |y \rangle

Let us now apply the inverse of the quantum Fourier transform to this state. The result will be the state

\frac{1}{\sqrt{AN}} \sum_{j=0}^{A-1} \sum_{s=0}^{N-1} \eta^{(x_0 + jr)s} |s \rangle

Now let us measure the first register. From the expression above, we can read off the probability P(s) to measure a certain value of s – we just have to add up the squares of all amplitudes with this value of s. This gives us

P(s) = \frac{1}{AN} \big| \sum_{j=0}^{A-1}  \eta^{jrs} \big|^2

This looks complicated, but in fact this is again a geometric series with coefficient q = \eta^{rs} . To see how the value of the series depends on s, let us assume for a moment that the period divides N (which is very unlikely in practice as N is a power of two, but let us assume this anyway just for the sake of argument), i.e. that N = r u with the frequency u being an integer. Thus, if s is a multiple of u, the coefficient q is equal to one (as \eta^N = 1 ) and the geometric series sums up to A, giving probability 1 / N to measure this value. If, however, s is not a multiple of u, the value of the geometric series is

\frac{1 - q^A}{1 - q}

But in our case, A is of course simply equal to u, and therefore qA is equal to one. Thus the amplitude is zero! We find – note the similarity to our analysis of the Fourier transform of a periodic sequence – that P(s) is sharply peaked at multiples of u = \frac{N}{r} !

We were able to derive this result using a few simplifications – an additional measurement and the assumption that the frequency is an integer. However, as carried out by Shor in [2], a careful analysis shows that these assumptions are not needed. In fact, one can show (if you want to see all the nitty-gritty details, you could look at Shor’s paper or at my notes on GitHub that are based on an argument that I have seen first in Preskill’s lecture notes) that with reasonably high probability, the result s of the measurement will be such that

\big| \{sr\}_N \big| \leq \frac{r}{2}

where \{sr\}_N denotes the residual of sr modulo N. Intuitively, this means that with high probability, the residual is very small, i.e. rs is close to a multiple of N, i.e. s is close to a multiple of N / r. In other words, it shows that in fact, P(s) has peaks at multiples of N / r.

The diagram below plots the probability distribution P(s) for our example, i.e. N = 512 and r = 6 (this plot has been generated using the demo available in my GitHub account which uses the numpy package to simulate a run of Shor’s algorithm on a classical computer)


As expected, we see sharp peaks, located roughly at multiples of 512 / 6 = 85.33. So when we measure the first register, the value s will be close to a multiple of 512 / 6 with a very high probability.

So the quantum algorithm can be summarized as follows.

  • Prepare a superposition \frac{1}{\sqrt{N}} \sum_k |k \rangle |x^k \mod M \rangle
  • Apply the (inverse of the) quantum Fourier transform to this state
  • Measure the value of the first register and call the result s

When running the simulation during which the diagram above was created, I did in fact get a measurement at s = 427 which is very close to 5*512 / 6.

Extracting the period

So having our measurement s = 427 in our hands, how can we use this to determine the period r? We know from the considerations above that s is close to a multiple of N / r, i.e. we know that there is an integer d such that

\big| sr - dN \big| \leq \frac{r}{2}

which we can rewrite as

\big| \frac{s}{N} - \frac{d}{r} \big| \leq \frac{1}{2N} < \frac{1}{M^2}

Thus we are given two rational numbers – s / N and d / r – which we know to be very close to each other. We have the first number s / N in our hands and want to determine the second number. We also know that the denominator r of the second number is smaller than M. Is this sufficient to determine d and r?

Surprisingly, the answer is “yes”. We will not go into details at this point and gloss over some of the number theory, but refer the reader to the classical reference [1] or to my notes for more details). The first good news is that two different fractions with denominators less than M need to be at least by 1 / M2 apart, so the number d / r is unique. The situation is indicated in the diagram below.


But how to find it? Luckily, the theory of continued fractions comes to the rescue. If you are not familiar with continued fractions, you can find out more in the appendix of my notes or on the very good Wikipedia page on this. Here, we will just go through the general procedure using our example at hand.

First, we write

\cfrac{427}{512} = 0 + \cfrac{1}{\cfrac{512}{427}}  = 0 + \cfrac{1}{1 + \cfrac{85}{427}}

We can do the same with 85 / 427, i.e. we can write

\cfrac{85}{427} = \cfrac{1}{\cfrac{427}{85}} =  \cfrac{1}{5 + \cfrac{2}{85}}

which will give us the decomposition

\cfrac{427}{512} = 0 + \cfrac{1}{1 + \cfrac{1}{5 + \cfrac{2}{85}}}

Driving this one step further, we finally obtain

\cfrac{427}{512} = 0 + \cfrac{1}{1 + \cfrac{1}{5 + \cfrac{1}{42 + \cfrac{1}{2}}}}

This is called the continued fraction expansion of the rational number 427 / 512. More generally, for every sequence [a_0 ; a_1, a_2, \dots ], we can form the continued fraction

a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \dots}}

given by that sequence of coefficients, which is obviously a rational number. One can show that in fact every rational number has a representation as a continued fraction, and our calculation has shown that

\cfrac{427}{512} = [0; 1,5,42,2]

This sequence has five coefficients. Now given a number m, we can of course look at the sequence that we obtain by looking at the cofficients up to index m only. For instance, for m = 3, this would give us the sequence

[0; 1,5,42]

The rational number represented by this sequence is

0 + \cfrac{1}{1 + \cfrac{1}{5 + \cfrac{1}{42}}}  = \cfrac{211}{253}

and is called the m-th convergent of the original continued fraction. We have such a convergent for every m, and thus get a (finite) series of rational numbers with the last one being the original number.

One can now show that given any rational number x, the series of m-th convergents of its continued fraction expansion has the following properties.

  • The convergents are in their lowest terms
  • With increasing m, the difference between x and the m-th convergent gets smaller and smaller, i.e. the convergents form an approximation of x that gets better and better
  • The denominators of the convergents are increasing

So the convergents can be used to approximate the rational number x by fractions with smaller denominator – and this is exactly what we need: we wish to approximate the rational number s / N by a ratio d / r with smaller denominator which we then know to be the period. Thus we need to look at the convergents of 427 / 512. These can be easily calculated and turn out to be

0, 1, \cfrac{5}{6}, \cfrac{211}{253}, \cfrac{427}{512}

The last convergent whose denominator is still smaller than M = 21 is 5 / 6, and thus we obtain r = 6. This is the period that we are looking for!

So in general, the recipe to get from the measured value s to r is to calculate the convergents of the rational number s / N and pick the denominator of the last convergent that has a denominator less than M. Again, if you want to see the exact algorithm, you can take a look at my script

Find the factor

We are almost done. We have run the quantum algorithm to obtain an approximate multiple of N / r. We have then applied the theory of continued fractions to derive the period r of x from this measurement. The last step – which is again a purely classical step – is now to use this to find a factor of M. This is in fact comparatively easy.

Recall that – by definition of the period – we get one if we raise x to the power of r and than reduce module M. In other words, xr minus one is a multiple of M. Now assume that we are lucky and the period r is even. Then

(x^{\frac{r}{2}} - 1)(x^{\frac{r}{2}} + 1) = (x^r - 1) \equiv 0 \mod M

With a bit of elementary number theory, one can now show that this implies that the greatest common divisor gcd(xr/2-1, M) is a factor of M (unless, in fact, xr/2 is minus one modulo M, in which case the argument fails). So to get from the period to a potential factor of M, we simply calculate this greatest common divisor and check whether it divides M!

Let us do this for our case. Our period is r = 6. With x = 11, we have x3 = 1331, which is 8 module M. Thus

\text{gcd}(x^{\frac{r}{2}} - 1 \mod M, M) =  \text{gcd}(7,21) = 7

which is the factor of M = 21 that we were looking for.

Performance of the algorithm

In our derivation, we have ignored a few special cases which can make the algorithm fail. For instance, suppose we had not measured s = 427, but s = 341 after applying the Fourier transform. Then the corresponding approximation to 341 / 512 would have been 4 / 6. However, the continued fraction algorithm always produces a result that is in its lowest terms, i.e. it would give us not 4 / 6, but 2 / 3. Looking at this, we would infer that r = 3, which is not the correct result.

There are a few other things that can go wrong. For instance, we could find a period r which is odd, so that our step to derive a factor of M from r does not work, or we might measure an unlikely value of s.

In all these cases, we need to start over and repeat the algorithm. Fortunately, Shor could show that the probability that any of this happens is bounded from below. This bound is decreasing with larger values of M, but it is decreasing so slowly that the expected number of trials that we need grows at most logarithmically and does not destroy the overall performance of the algorithm.

Taking all these considerations into account and deriving bounds for the number of gates required to perform the quantum part of the algorithm, Shor was able to show that the number of steps to obtain a result grows at most polynomially with the number of bits that the number M has. This is obviously much better than the best classical algorithm that requires a bit less than exponential time to factor M. Thus, assuming that we are able to build a working quantum computer with the required number of gates and qubits, this algorithm would be able to factorize large numbers exponentially faster than any known classical algorithm.

Shor’s algorithm provides an example for a problem that is believed to be in the class NP (but not in P) on a classical computer, but in the class BQP on a quantum computer – this is the class of all problems that can be solved in polynomial time with a finite probability of success. However, even though factorization is generally believed not to be in P, i.e. not doable in polynomial time on classical hardware, there is not proof for that. And, even more important, it is not proved that factorization is NP-complete. Thus, Shor’s algorithm does not render every problem in NP solvable in polynomial time on a quantum computer. It does, however, still imply that all public key cryptography systems like RSA that rely on the assumption that large numbers are difficult to factor become inherently insecure once a large scale reliable quantum computer becomes available.


1. G.H. Hardy, E.M. Wright, An introduction to the theory of numbers, Oxford University Press, Oxford, 1975
2. P. Shor, Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer, SIAM J.Sci.Statist.Comput. Vol. 26 Issue 5 (1997), pp 1484–1509, available as arXiv:quant-ph/9508027v2

The quantum Fourier Transform

We are getting closer to the most spectacular early quantum algorithm – Shor’s algorithm for factoring large composite numbers which can be used to break the most widely used public key cryptography systems. But before we can tackle this algorithm, there is one more thing that we need to understand – the quantum Fourier transform.

The discrete Fourier transform

Let us leave the quantum world for a few moments and take a look at a classical area in mathematics – the discrete Fourier transform. To motivate the definition to come, let us for a moment assume that we are observing a time dependent signal, i.e. a function f(t) depending on the time t. Let us also assume that we are interested in processing this signal further, using a digital computer. This will force us to reduce the full signal to a finite number of values, i.e. to a set of values f(0), f(1), …, f(N-1) at N discrete points in time.

Now assume that we wanted to calculate the classical Fourier transform of this function, i.e. the function \hat{f} given by (the exact formula depends on a few conventions)

\hat{f}(x) = \int_{-\infty}^{\infty} f(t) e^{-i x t} dt

If we now replace the function f by the sum of those functions which have value f(i) over a range of length 1, i.e. by its discrete version, this formula turns into

\hat{f}(x) = \sum_{t=0}^{N-1} f(t) e^{-i x t}

Now we could code the discrete values f(0), f(1), .. as a sequence xk of numbers, and we could apply the same process to the Fourier transform and replace it by a sequence Xk of numbers, representing again the values at the discrete points 0, 1, …. The above formula then tells us that these numbers would be given by

X_k = \sum_{j=0}^{N-1} x_j e^{-i j k}

We could now look at this formula as giving us an abstract mapping from the set of sequences of N numbers to itself, provided by (putting in a factor 2 \pi / N )

X_k = \sum_{j=0}^{N-1} x_j e^{- \frac{2\pi i}{N} j k}

This transformation is called the discrete Fourier transform.

There is a different way to look at this which is useful to derive some of the properties of the discrete Fourier transform. For every k, we can build the N-dimensional complex vector u^k with components u^k_n by setting

u^k_n = e^{\frac{2\pi i}{N} k n}

These vectors are mutually orthogonal with respect to the standard hermitian product. In fact, the product of any two of these vectors is given by

(u^k u^{k'})^* = \sum_{j=0}^{N-1} e^{\frac{2\pi i}{N}j(k-k')}

If k = k' , this is clearly N. Otherwise, we can write this as

(u^k u^{k'})^* = \sum_{j=0}^{N-1} q^j


q=e^{\frac{2\pi i}{N}(k-k')}

If q is not equal to one, i.e. if k is not equal to k’, then, according to the usual formula for a geometric series, this is equal to


However, this is zero, because q^N = 1 due to the periodicity of the exponential function.

Using this orthogonal basis, we can write the formula for the discrete Fourier transform simply as the hermitian product

X_k = \langle x, u^k \rangle

In particular, the discrete Fourier transform can be seen as the linear transformation that maps the k-th unit vector onto the vector (u^k)^* . By adding a factor 1 / \sqrt{N} , this can be turned into a unitary transformation, i.e. in a certain sense the discrete Fourier transform is simply a rotation. This relation can also be used to easily derive a formula for the inverse of the discrete Fourier transform. In fact, if we expand the vector x into the basis \{u^k\}_k , we obtain

x = \frac{1}{N} \sum_k \langle x, u^k \rangle u^k

and therefore

x_k = \langle x, e_k \rangle  = \frac{1}{N} \sum_j \langle x, u^j \rangle \langle u^j, e_k \rangle = \frac{1}{N} \sum_j X_j e^{\frac{2\pi i}{N} jk}

Fourier transforms of a periodic sequence

For what follows, let us adapt and simplify our notation a bit. First, we add a factor 1 / \sqrt{N} to the formula for the Fourier coefficient, so that the Fourier transform is unitary. Second, we use the symbol \eta to denote the N-th root of unity, i.e.

\eta = e^{\frac{2\pi i }{N}}

With this notation, the formula for the Fourier coefficient is then

X_k = \frac{1}{\sqrt{N}} \sum_j x_j \eta^{-kj}

and the inverse is given by

x_k = \frac{1}{\sqrt{N}} \sum_j X_j \eta^{kj}

Let us now study a special case that is highly relevant for the applications to the factoring of large numbers – the Fourier transform of a periodic sequence. Thus, suppose there is a sequence xk and a number r called the period such that

x_{rs + t} = x_t

for all values of s and t that lead to valid indices. Thus, roughly speaking, after r indices, the values of the sequence repeat themselves. It is also common to reserve the term period for the smallest number with this property. We call the number u = \frac{N}{r} the frequency of the sequence.

Let us now assume that the frequency is a natural number, i.e. that the period divides N, and let us try to understand what this means for the Fourier transform. Using the periodicity, we can write the coefficients as follows.

X_k = \frac{1}{\sqrt{N}} \sum_j x_j \eta^{-jk} = \frac{1}{\sqrt{N}} \sum_{s,t} x_{sr + t}\eta^{(-sr+t)k} = \frac{1}{\sqrt{N}} (\sum_s \eta^{-srk})(\sum_t x_t \eta^{-tk})

where s runs from 0 to u-1. We can now write the first of the sums as follows.

\sum_{s=0}^{u-1} \eta^{-srk} =  \sum_{s=0}^{u-1} e^{-\frac{2\pi i}{u}s k}

Now this is again a geometric series with q = e^{-\frac{2\pi i}{u}k} ! We can therefore conclude as above that this is zero unless q is one, i.e. unless k is a multiple of the frequency u. Thus we have shown that if the sequence xk is periodic with integral frequency u, then all Fourier coefficients Xk are zero for which k is not a multiple of the frequency.

In the later applications, this fact will be applied in a situation where we know that a sequence is periodic, but the period is unknown. We will then perform a Fourier transformation and inspect the coefficients Xk. We can then conclude that the unknown frequency u must be a common divisor of all those indices k for which Xk is different from zero. This is exactly true if the period divides N, and we will see later that in important cases, it is still approximately true if the period does not divide N.

A quantum algorithm to compute the discrete Fourier transform

Let us now carry over our considerations to the world of quantum computing. In a quantum computer with n qubits, the Hilbert space of states is N-dimensional with N=2n, and is spanned by the basis |k \rangle . Every vector can be described by the sequence of its coefficients when expanding it into this basis. Consequently, the discrete Fourier transform defines a unitary transformation on the Hilbert by applying the mapping

\sum_k x_k |k \rangle \mapsto \sum_k X_k |k \rangle

Now this is a unitary transformation, and as any such transformation, can be implemented by a quantum circuit. We will not go into details on this, see for instance [1], section 7.8 or section 5.1 of [2] (but be careful, these authors use a different sign convention for the Fourier transform). The important part, however, is that a quantum Fourier transform for N=2n can be realized with O(n2) quantum gates.

In contrast to this, the best known classical algorithms for computing the discrete Fourier transform are commonly known as fast Fourier transform and require O(Nn) steps. Thus it seems that we have again found a quantum algorithm which is substantially faster than its classical counterpart.

Unfortunately, this is not really true, as we have not yet defined our measurement procedure. In fact, if we measure the result of applying a quantum Fourier transform, we destroy the superposition. In addition, the coefficients in which we are interested are the amplitudes of the possible outcomes, and there is no obvious way to measure them – even if we perform the transformation several times and measure over and over again, we only obtain approximations to the probabilities which are the absolute values of the squared amplitudes, so we do not obtain the phase information. Therefore, it is far from clear whether a quantum algorithm can help to compute the Fourier transform more efficiently than it is possible with a classical algorithm.

However, most applications of the Fourier transform in quantum algorithms are indirect, using the transform as a sort of amplification step to focus amplitudes on interesting states. In the next post, we will look at Shor’s algorithm with exploits the periodicity properties of the Fourier transform to factor large composite numbers.


1. E. Rieffel, W. Polak, Quantum computing: a gentle introduction, MIT Press
2. M.A. Nielsen, I.L. Chuang, Quantum computation and quantum information, Cambridge University Press

Grover’s algorithm – unstructured search with a quantum computer

In the last post, we have looked at the Deutsch-Jozsa algorithm that is considered to be the first example of a quantum algorithm that is structurally more efficient than any classical algorithm can probably be. However, the problem solved by the algorithm is rather special. This does, of course, raise the question whether a similar speed-up can be achieved for problems that are more relevant to practical applications.

In this post, we will discuss an algorithm of this type – Grover’s algorithm. Even though the speed-up provided by this algorithm is rather limited (which is of a certain theoretical interest in its own right), the algorithm is interesting due to its very general nature. Roughly speaking, the algorithm is concerned with an unstructured search. We are given a set of N = 2n elements, labeled by the numbers 0 to 2n-1, exactly one of which having a property denoted by P. We can model this property as a binary valued function P on the set \{0,N-1\} that is zero on all but one elements. The task is to locate the element x0 for which P(x0) is true.

Grover’s algorithm

Grover’s algorithm presented in [1] proceeds as follows to locate this element. First, we again apply the Hadamard-Walsh operator W to the state |0 \rangle of an n-qubit system to obtain a superposition of all basis states. Then, we iteratively apply the following sequence of operations.

  1. Apply a conditional phase shift S, i.e. apply the unique unitary transformation that maps |x \rangle to (-1)^{f(x)} |x \rangle .
  2. Apply the unitary transformation D called diffusion that we will describe below

Finally, after a defined number of outcomes, we perform a measurement which will collaps the system into one of the states |x \rangle . We claim – and will see why this is true below – that for the right number of iterations, this value x will, with a high likelihood, be the solution to our problem, i.e. equal to x0.

Before we can proceed, we need to define the matrix D. This matrix is \frac{2}{N} - 1 along the diagonal, with N = 2n, and \frac{2}{N} away from the diagonal. In terms of basis vectors, the mapping is given by

|i \rangle \mapsto \frac{2}{N} (\sum_j |j \rangle) - |i \rangle

Consequently, we see that

D \sum_i a_i |i \rangle = \sum_i (2 \bar{a} - a_i) |i \rangle

where \bar{a} is the average across the amplitudes a_i . Thus geometrically, the operation D performs an inversion around the average. Grover shows that this operation can be written as minus a Hadamard-Walsh operation followed by the operation that flips the sign for |0 \rangle , followed by a second Hadamard-Walsh transformation.

For the sake of completeness, let us also briefly discuss the first transformation employed by the algorithm, the conditional phase shift. We have already seen a similar transformation while studying the Deutsch-Jozsa algorithm. In fact, we have shown in the respective blog post that the circuit displayed below (with the notation slightly changed)


performs the required operation

|\psi \rangle = \sum_x a_x |x \rangle \mapsto |\psi' \rangle = \sum_x a_x (-1)^{P(x)} |x \rangle

Let us now see how why Grover’s algorithm works. Instead of going through the careful analysis in [1], we will use bar charts to visualize the quantum states (exploiting that all involved matrices are actually real valued).

It is not difficult to simulate the transformation in a simple Python notebook, at least for small values of N. This script performs several iterations of the algorithm and prints the result. The diagrams below show the outcome of this test.


Let us go through the diagrams one by one. The first diagram shows the initial state of the algorithm. I have used 3 qubits, i.e. n = 3 and N = 8. The initial state, after applying the Hadamard-Walsh transform to the zero state, is displayed in the first line. As expected, all amplitudes are equal to 1 over the square root of eight, which is approximately 0.35, i.e. we have a balanced superposition of all states.

We now apply one iteration of the algorithm. First, we apply the conditional phase flip. The element we are looking for is in this case located at x = 2. Thus, the phase flip will leave all basis vectors unchanged except for |2 \rangle and it will change the amplitude of this vector to – 0.35. This will change the average amplitude to a value slightly below 0.35. If we now perform the inversion around the average, the amplitudes of all basis vectors different from |2 \rangle will actually decrease, whereas the amplitude of |2 \rangle will increase. The result is displayed in the second line of the diagram.

Thus, what really happens in this case is an amplitude amplification – we increase the amplitude of one component of the superposition while decreasing all the others.

The next few lines show the result of repeating these two steps. We see that after the second iteration, almost all of the amplitude is concentrated on the vector |2 \rangle , which represents the solution we are looking for. If we now perform a measurement, the result will be 2 with a very high probability!

It is interesting to see that when we perform one more iteration, the difference between the amplitude of the solution and the amplitudes of all other components decreases again. Thus the correct choice for the number of iterations is critical to make the algorithm work. In the last line, we have plotted the difference between the amplitude of |2 \rangle and the other amplitudes (more precisely, the ratio between the amplitude of |2 \rangle and the second largest amplitude) on the y-axis for the different number of iterations on the x-axis. We see that the optimal number of iterations is significantly below 10 (actually five iterations give the best result in this case), and more iterations decrease the likelihood of getting the correct result out of the measurement again. In fact, a careful analysis carried out in [2] shows that for large values of N, the best number of iterations is given by \frac{\pi}{4} \sqrt{N} , and that doubling the number of iterations does in general lead to a less optimal result.

Generalizations and amplitude amplification

In a later paper ([3]), Grover describes a more general setup which is helpful to understand the basic reason why the algorithm works – the amplitude amplification. In this paper, Grover argues that given any unitary transformation U and a target state (in our case, the state representing the solution to the search problem), the probability to meet the target state by applying U to a given initial state can be amplified by a sequence of operations very much to the one considered above. We will not go into details, but present a graphical representation of the algorithm.

So suppose that we are given an n-qubit quantum system and two basis vectors – the vector t representing the target state and an initial state s. In addition, assume we are given a unitary transformation U. The goal is to reach t from s by subsequently applying U itself and a small number of additional gates.

Grover considers the two-dimensional subspace spanned by the vectors s and U-1t. If, within this subspace, we ever manage to reach U-1t, then of course one more application of U will move us into the desired target state.

Now let us consider the transformation

Q = - I_s U^{-1} I_t U

where Ix denotes a conditional phase shift that flips the phase on the vector |x \rangle . Grover shows that this transformation does in fact leave our two-dimensional subspace invariant, as indicated in the diagram below.


He then proceeds to show that for sufficiently small values of the matrix element Uts, the action of Q on this subspace can approximately be described as a rotation by the angle \frac{2}{|U_{ts}|} . Applying the operation Q n times will then approximately result in a superposition of the form

\cos (\frac{2n}{|U_{ts}|})|s \rangle + a U^{-1}|t \rangle

Thus if we can make the first coefficient very small, i.e. if \frac{4n}{|U_{ts}|} is close to a multiple of \pi , then one application of U will take our state to a state very close to t.

Let us link this description to the version of the Grover algorithm discussed above. In this version, the initial state s is the state |0 \rangle . The transformation U is the Hadamard-Walsh transformation W. The target state is |x_0 \rangle where x0 is the solution to the search problem. Thus the operation It is the conditional phase shift that we have denoted by S earlier. In addition, Grover shows in [1] already that the diffusion operator D can be expressed as -W I0 W. Now suppose we apply the transformation Q n times to the initial state and then apply U = W once more. Then our state will be

W Q \dots Q |0 \rangle

This can be written as

W (-I_0 W S W) (-I_0 W S W) \dots (-I_0 W S W) |0 \rangle

Regrouping this and using the relation D = -W I0 W, we see that this is the same as

- (W I_0 W) S (- W I_0 W) S \dots  \dots (- W I_0 W) S W |0 \rangle  = D S \dots D S W |0 \rangle

Thus the algorithm can equally well be described as applying W once to obtain a balanced superposition and then applying the sequence DS n times, which is the formulation of the algorithm used above. As |U_{ts} | = \frac{1}{\sqrt{N}} in this case, we also recover the result that the optimal number of iterations is \frac{\pi}{4} \sqrt{N} for large N.


Grover’s algorithm is highly relevant for theoretical reasons – it applies to a very generic problem and (see the discussions in [1] and [2]) is optimal, in the sense that it provides a quadratic speedup compared to the best classical algorithm that requires O(N) operations, and that this cannot be improved further. Thus Grover’s algorithm provides an interesting example for a problem where a quantum algorithm delivers a significant speedup, but no exponential speedup as we will see it later for Shor’s algorithm.

However, as discussed in detail in section 9.6 of [4], the relevance of the algorithm for practical applications is limited. First, the algorithm applies to an unstructured search, i.e. a search over unstructured data. In most practical applications, we deal with databases that have some sort of structure, and then more efficient search algorithms are known. Second, we have seen that the algorithm requires O(\sqrt{N}) applications of the transformation UP. Whether this is better than the classical algorithm does of course depend on the efficiency with which we can implement this with quantum gates. If applying UP requires O(N) operations, the advantage of the algorithm is lost. Thus the algorithm only provides a significant speedup if the operation UP can be implemented efficiently and there is no additional structure that a classical algorithm could exploit.

Examples of such problems are brute forces searches as they appear in some crypto-systems. Suppose for instance we are trying to break a message that is encrypted with a symmetric key K, and suppose that we know the first few characters of the original text. We could then try to use an unstructured search over the space of all keys to find a key which matches at least the few characters that we know.

In [5], a more detailed analysis of the complexity in terms of qubits and gates that a quantum computer would have to attack AES-256 is made, arriving at a size of a few thousand logical quantum bits. Given the current ambition level, this does not appear to be completely out of reach. It does, however, not render AES completely unsecure. In fact, as Grover’s algorithm essentially results in a quadratic speedup, a code with a key length of n bits in a pre-quantum world is essentially as secure as the same code with a key length of 2n in a post-quantum world, i.e. roughly speaking, doubling the key length compensates the advantage of quantum computing in this case. This is the reason why the NIST report on post-quantum cryptography still classifies AES as inherently secure assuming increased key sizes.

In addition, the feasibility of a quantum algorithm is not only determined by the number of qubits required, but also by other factors like the depth, i.e. the number of operations required, and the number of quantum gates – and for AES, the estimates in [5] are significant, for instance a depth of more than 2145 for AES-256, which is roughly 1043. Even if we assume a switching time of only 10-12 seconds, we still would require astronomical 1031 seconds, i.e. in the order of 1023 years, to run the algorithm.

Even a much less sophisticated analysis nicely demonstrates the problem behind these numbers – the number of iterations required. Suppose we are dealing with a key length of n bits. Then we know that the algorithm requires

\frac{\pi}{4} \sqrt{2^n} \approx 0.8 \sqrt{2}^n

iterations. Taking the decimal logarithm, we see that this is in the order of 100.15*n. Thus, for n = 256, we need in the order of 1038 iterations – a number that makes it obvious that AES-256 can still be considered secure for all practical purposes.

So overall, there is no reason to be overly concerned about serious attacks to AES with sufficiently large keys in the near future. For asymmetric keys, however, we will soon see that the situation is completely different – algorithms like RSA or Elliptic curve cryptography are once and for all broken as soon as large-scale usable quantum computer become reality. This is a consequence of Shor’s algorithm that we will study soon. But first, we need some more preliminaries that we will discuss in the next post, namely quantum Fourier transforms.


1. L.K. Grover, A fast quantum mechanical algorithm for database search, Proceedings, 28th Annual ACM Symposium on the Theory of Computing (STOC), May 1996, pages 212-219, available as arXiv:quant-ph/9605043v3
[2] M. Boyer, G. Brassard, P. Høyer, A. Tapp, Tight bounds on quantum searching, arXiv:quant-ph/9605034
3. L.K. Grover, A framework for fast quantum mechanical algorithms, arXiv:quant-ph/9711043
4. E. Rieffel, W. Polak, Quantum computing – a gentle introduction, MIT Press
5. M. Grassl, B. Langenberg, M. Roetteler, R. Steinwandt, Applying Grover’s algorithm to AES: quantum resource estimates, arXiv:1512.04965

Quantum algorithms – a first example

We have now seen how quantum gates implement unitary transformations on qubits. Intuitively, we should, as in the classical case, now be able to combine these quantum gates into quantum algorithms that perform more complex sequences of manipulations of a quantum state. In this post, we will look at a first example to get an idea how this works. The material in this section is well covered by standard textbooks, like [1] section 3.8 or [2], chapter 6 and chapter 7. In addition, the fundamental structure of quantum algorithm is discussed in the classical papers [3] and [4].

In general, a quantum algorithm can be broken down into several phases. First, we need to initialize the system in a known initial state. Then, we apply a series of transformations, which, as we know, can be described in terms of quantum gates. Finally, we perform a measurement to derive a result.

To illustrate this, let us assume that we are given a more or less arbitrary classical problem that we want to solve using a quantum computer. Formally, this could be a function

f \colon {\{0,1\}}^n \rightarrow {\{0,1\}}^m

mapping binary numbers of length n to binary numbers of length m. We might have an algorithm to calculate this function on a classical computer and are looking for a quantum version of this algorithm.

The first problem we need to solve is that, as we know, quantum algorithms are given by unitary transformations, in particular they are reversible. In the most general setting however, a function f like the one above is not reversible – even if n is equal to m, there is no guarantee that the function is invertible. Fortunately, that can easily be fixed. If we let l = m + n and consider the function

\{0,1\}^l \rightarrow \{0,1\}^l

that maps a pair (x,y) with x \in \{0,1\}^n and y \in \{0,1\}^m to (x, y \oplus f(x)) , where \oplus is the bitwise exlusive or, we obtain an invertible function (as y = (y \oplus f(x)) \oplus f(x)) ), and we can recover our old function f if we let y = 0 .

Now let us consider a quantum computer that has l qubits, so that its states are given by elements of a Hilbert space of dimension 2l. We can then identify the bit strings of length l, i.e. the elements of \{0,1\}^l, with the natural numbers between 0 and 2l-1 and these numbers in turn with elements of the standard basis – the bit string 0 \dots 11 for instance would correspond to the element of the standard basis that we denote by |3\rangle. Thus, the function described above defines a permutation of the elements of the standard basis. Consequently, by the rules of linear algebra, there is exactly one unitary transformation Uf with the property that (with that identification)

U_f(|x, y\rangle) = |x, y \oplus f(x)\rangle

The transformation Uf is a unitary transformation and, as any unitary transformation, can therefore be decomposed into a sequence of elementary quantum gates. Let us now see how these quantum gates yield an algorithm to calculate f.

Preparing the initial state

First, we need to place our quantum system in some well defined state. For our example, we will use the so-called Hadamard-Walsh transformation to do this. This transformation is defined as the n-fold tensor product of the Hadamard operator H with itself and often denoted by W. Let us see how this operator acts on the state |00 \dots 0 \rangle. First, we can write this as

W |00 \dots 0 \rangle = (H \otimes \dots \otimes H) (|0\rangle \otimes \dots \otimes |0 \rangle)

Applying the Hadamard operator to each of the n factors individually, we see that this is

W |00 \dots 0 \rangle = \frac{1}{\sqrt{2^n}} (|0\rangle + |1\rangle) \otimes \dots \otimes (|0 \rangle + |1\rangle)

Multiplying out this expression, we see that each possible combination of ones and zeros appears exactly once within the sum. In other words, each of the states |x\rangle appears exactly once, where x ranges from 0 to 2n-1. Consequently, we see that

W |00 \dots 0 \rangle = \frac{1}{\sqrt{2^n}} \sum_0^{2^{n-1}} |x\rangle

In our case, the state space is a product of n qubits represented by the variable x and m qubits represented by the variable y. As initial state for our quantum algorithm, we will now choose the state that is obtained by applying the Hadamard-Walsh operator W to the first n qubits of the state |0,0\rangle and the identity to the remaining m qubits. Thus, our initial state will be

(W \otimes I) (|0, 0\rangle) = \frac{1}{\sqrt{2^n}} \sum_0^{2^{n-1}} |x, 0\rangle

Transformation and measurement

Having defined the initial state, let us now apply the transformation Uf. Using the fact that this transformation is of course linear and the formula for U_f|x,y\rangle obtained above, we easily see that applying this transformation to our initial state yields the state

U_f (W \otimes I) (|0, 0\rangle) = \frac{1}{\sqrt{2^n}} \sum_0^{2^{n-1}} |x, f(x)\rangle

It is worth pausing for a moment to take a closer look at this expression. The right hand side of this expression is a superposition of states that encodes the value of f(x) for all possible values of x. Thus, if we can find a way to physically implement quantum devices that are able to

  • Initialize an n+m qubit quantum system in a well-known state like |0,0\rangle
  • realize the transformations W \otimes I and U_f by a set of quantum gates

we are able to create a state that somehow computes all values f(x) in only one pass of the quantum algorithm.

At the first glance, this sounds like magic. Regardless how big n is, we would be able to calculate all values of f(x) in only one pass of our quantum algorithm! In other words, we would calculate all values of f(x) fully in parallel. This (apparent) feature of quantum computers is often referred to be the term quantum parallelism.

However, there is a twist. Even if we are able to apply a quantum algorithm in order to put our system in a state as above encoding all values of f(x), we still need to be able to extract the actual values of f(x) from that state. Thus, we need to perform a measurement.

If, for instance, we are using a measurement device that corresponds to a projection onto the individual states |x,y\rangle , i.e. to the basis of our Hilbert space, then applying that measurement would yield a state

|x, f(x)\rangle

for some value of x. Unfortunately, we cannot tell upfront to which of these vectors we would project. In fact, all these vectors appear with equal amplitude and the probability to obtain each of these vectors is \frac{1}{2^n}. Thus to retrieve the value of f(x) for a given value of x, we would have a rather small probability to be successful with only one pass of the algorithm.

But the situation is even worse – once we have executed our measurement, the superposition created by our algorithm is destroyed and the system has collapsed to one of the eigenstates. Thus, if we wanted to perform a large number of measurements in order to obtain f(x) for a given value of x with high probability, we would have to execute our algorithm again for each of the measurements, resulting in a large number of passes. It appears that the advantage of quantum parallelism is lost again….

Fortunately, the situation is not as hopeless as one might think. In many cases, we are not actually interested in all values of f(x). Rather, we are interested in some property of the function f, maybe even in a property that has a simple yes-no answer and can therefore be represented by a single bit. If we find a way to adapt our measuring process by projecting to a different basis or by transforming our state further before measuring, we might be able to get our answer with one measurement only. In the next section, we will look at an algorithm which is in a certain sense the blueprint and inspiration of many other quantum algorithms exploiting this approach.

The Deutsch-Jozsa algorithm

The Deutsch-Jozsa algorithm is a quantum algorithm that tackles the following problem. Suppose we are given a function

f \colon \{0,1\}^n \rightarrow \{0,1\}

We call such a function balanced if the number of times this function takes the value 0 is equal to the number of times this function takes the value 1 (which is obviously only possible if n is even which we will assume).

Now assume that we already know that either f is balanced or f is constant. We want to determine which of these two properties holds. With a classical algorithm, we would have to call f at most n/2 + 1 times, but might as well need this number of invocations for some functions f. In [4], Deutsch and Jozsa presented a quantum algorithm to solve this problem which requires only two applications of the unitary transformation Uf which is the equivalent of applying f in the quantum world, which is generally considered to be the first quantum algorithm that demonstrates a true advantage compared to a classical computation.

To describe the algorithm, we follow the general approach outlined above. We consider a quantum system with n+1 qubits and again denote a basis of the corresponding Hilbert space by |x, y \rangle where x is a n-bit number and y is 0 or 1. We again obtain an initial state

|\Phi_0 \rangle = (H \otimes \dots H \otimes I) |0,0 \rangle = \frac{1}{\sqrt{2^n}} \sum_x |x,0 \rangle

by applying the Hadamard-Walsh transformation to the state |0,0 \rangle . The algorithm then proceeds by applying Uf resulting in the state

U_f |\Phi_0 \rangle =  \frac{1}{\sqrt{2^n}} \sum_x |x, f(x) \rangle

Now an additional “smart” operation comes into play. First, we use an operator which is denoted by S in [4] and is defined by

S|x,y\rangle = (-1)^y |x,y \rangle

(this is a bit in conflict with our previous notation where we have used the symbol S for the phase shift operator, but let us again this for now to stay aligned with the notation in [4]). When we apply the operator S to our state, we obtain

S U_f |\Phi_0 \rangle =  \frac{1}{\sqrt{2^n}} \sum_x (-1)^{f(x)} |x, f(x) \rangle

Finally, we apply the inverse of Uf, which is actually Uf itself, as a straightforward calculation on basis vectors using the definition of Uf shows. Thus we end up with the state

|\psi \rangle = U_f S U_f |\Phi_0 \rangle =  \frac{1}{\sqrt{2^n}} \sum_x (-1)^{f(x)} |x, 0 \rangle

Let us now compute the overlap, i.e. scalar product, of this state with the state |\Phi_0 \rangle . As clearly

\langle \Phi_0 | x,0 \rangle  = \frac{1}{\sqrt{2^n}}

we obtain

\langle \Phi_0 | U_f S U_f |\Phi_0 \rangle =  \frac{1}{2^n} \sum_x (-1)^{f(x)}

Now suppose that we perform a measurement corresponding to the projection onto the subspace spanned by |\Phi_0 \rangle and let us try to understand the possible outcomes of the measurement. If the function f is constant, then the above sum is simply \pm 1 , depending on the sign of f. Thus our measurement will give us one with certainty, as our state is simply a scalar multiple of |\Phi_0 \rangle . If, however, f is not constant but balanced, the sum is zero. Thus we will obtain the value zero with probability one. Thus we can decide whether f is balanced or constant with only one measurement!

How would we actually perform such a measurement, assuming that we can measure the individual qubits of our quantum computer? To perform a measurement corresponding to the projection onto |\Phi_0 \rangle , a natural approach is to apply a transformation which takes |\Phi_0 \rangle to one of the basis vectors |x, y \rangle, say to |0,0 \rangle . But this is exactly the Hadamard-Walsh transformation. As this transformation is unitary and hermitian, we have

\langle \Phi_0 | U_f S U_f |\Phi_0 \rangle  =  \langle 0,0  | (W \otimes I) U_f S U_f |\Phi_0 \rangle

and we see that

\langle 0,0  | (W \otimes I) U_f S U_f |\Phi_0 \rangle = \frac{1}{2^n} \sum_x (-1)^{f(x)}

We can now repeat our previous argument to link the structure of f to the outcome of a measurement. In fact, if f is constant, the scalar product is one and our final state will be a multiple of |0,0 \rangle , if f is balanced, the scalar product will be zero and our state has no overlap with |0, 0 \rangle.

Now measurement in the standard basis corresponds to successive measurement of all individual qubits. Thus in the first case, all bits will be measured as zero with certainty, and in the second case, at least one bit will be measures as one with certainty. Hence we arrive at the following form of the algorithm.

  • Prepare the quantum system in the initial state |0, 0 \rangle
  • Apply the Hadamard-Walsh transformation to the first n qubits
  • Apply Uf, then apply the phase shift operator f and then apply Uf once more
  • Finally apply the Hadamard-Walsh transformation once more to the first n qubits and measure all qubits
  • If the measurement yields zero for all qubits, the function f is constant, otherwise the function f is balanced

This version of the algorithm, which is essentially the version presented in the original paper [4], uses two invocations of Uf. Later, a slighly more efficient version of the algorithm was developed ([5]) that only requires one application of Uf. From a theoretical point of view, it is less important whether we need one or two “calls” to Uf – the crucial point is that quantum parallelism makes the number of invocations constant, independent of the value of n. However, from a practical point of view, i.e. when it comes to an implementation on a real quantum computer, being able to reduce the number of involved qubits and gates is of course essential. We briefly sketch the required circuit which we will also see again when we discuss other algorithms.

Given a function f, this circuit will create the state |\psi \rangle from the | \Phi_0 \rangle with only one application of Uf. It uses an ancilla qubit that is initialized in the state |1\rangle . The graphical representation is as follows.


Let us quickly check that the circuit is doing what we want. It starts with the superposition across all states x that we denote by

| \Phi  \rangle =  \frac{1}{\sqrt{2^n}} \sum_x |x\rangle

Then we add an ancilla qubit initialized to |1\rangle to which we apply the Hadamard transformation. Now our state is

| \Phi  \rangle \otimes H|1\rangle =  \frac{1}{\sqrt{2^{n+1}}} \sum_x (|x, 0 \rangle - |x,1 \rangle)

We now let Uf act on this state. If f(x) is zero for some value of x, this transformation maps |x, 0 \rangle to itself and |x, 1 \rangle to itself. If, however, f(x) is one, it maps |x, 0 \rangle to |x, 1 \rangle and |x, 1 \rangle to |x, 0 \rangle . Thus we find that

U_f \sum_x (|x, 0 \rangle - |x,1 \rangle) = \sum_x (-1)^{f(x)} ( |x, 0 \rangle - |x,1 \rangle)

and consequently

U_f | \Phi  \rangle \otimes H|1\rangle =  \frac{1}{\sqrt{2^{n+1}}} \sum_x (-1)^{f(x)}  (|x, 0 \rangle - |x,1 \rangle)

which is the same as

\frac{1}{\sqrt{2^n}}  (\sum_x (-1)^{f(x)} |x\rangle) \otimes H|1\rangle

Now applying the Hadamard operator once more to the ancilla qubit followed by the negation operator X transforms the ancilla qubit to |0 \rangle and we end up with the state |\psi \rangle as claimed.

The Deutsch-Jozsa algorithm is widely acknowledged as the first real quantum algorithm that leverages the potential of a quantum computer and achieves a real improvement compared to classical hardware, while still yielding a deterministic result despite the stochastic nature of quantum systems. Of course, the problem solved by the algorithm is mainly of academical interest. In the next few posts, we will move on to study other algorithms that have been developed in the sequel that are more relevant for actual applications.


1. G. Benenti, G. Casati, G. Strini, Principles of Quantum Computation and Information, Volume 1, World Scientific
2. E. Rieffel, W. Polak, Quantum computing – a gentle introduction, MIT Press
3. D. Deutsch, Quantum Theory, the Church-Turing principle and the Universal Quantum Computer, Proceedings of the Royal Society of London A. Vol. 400, No. 1818 (1985), pp. 97-117
4. D. Deutsch, R. Jozsa, Rapid Solution of Problems by Quantum Computation, Proceedings of the Royal Society of London A, Vol. 439, No. 1907 (1992), pp. 553-558
5. R. Cleve, A. Ekert, C. Macchiavello, M. Mosca, Quantum algorithms revisited, arXiv:quant-ph/9708016

Quantum gates

So far, we have looked at states of a quantum computer and expressed these states in terms of qubits. However, just having a static state is of very limited use – to perform actual computations, we of course have to change our state over time.

In quantum mechanics, state changes are described by unitary transformations on the Hilbert space describing our quantum system. In analogy with a classical computer, where signals traverse logical gates and are manipulated by these gates to perform calculations, these transformations are called quantum gates. However, please keep in mind that in most actual implementations of quantum computers, these gates are not circuits or building blocks in the classical sense through which the qubits somehow travel. Instead, the qubits are mostly at a fixed point in space, and a gate is applied to a quantum state by manipulating the system, using for instance laser beams or magnetic fields.

One qubit quantum gates

Let us again start with the case of a single qubit, i.e. with a two-dimensional Hilbert space. With respect to the standard basis \{ |0\rangle, |1\rangle \} , a unitary transformation is of course given by a unitary 2×2 matrix.

As a first example, let us consider the matrix

X = \begin{pmatrix} 0 & 1 \\ 1 & 0\end{pmatrix}

Obviously, this operator exchanges the states |0 \rangle and |1 \rangle and is therefore sometimes called the bit-flip or negation. Its eigenvalues are 1 and -1, with a pair of orthogonal and normed eigenvectors given by

|+\rangle = \frac{1}{\sqrt{2}} (|0\rangle + |1 \rangle)


|-\rangle = \frac{1}{\sqrt{2}} (|0\rangle - |1 \rangle)

A second commonly used operator is the operator conventionally denoted by Z and given by

Z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}

Clearly, the eigenvalues of this operator are again +1 and -1, but this time, an eigenbasis is given by the standard basis. On a superposition, this operator changes the relative phase, so it is often called the phase change or phase flip operator. A reader with a background in quantum physics might recognize these two operators as being two of the three Pauli operators. They all have determinant -1, trace 0, are hermitian and unitary and their square is the identity operator.

The eigenstates of X form an orthogonal basis, and hence they can be mapped into the standard basis by applying a unitary transformation. The corresponding operator is the Hadamard operator given by

H = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}

As we can see, H |0\rangle = |+ \rangle and H |1 \rangle = |- \rangle , and, as H^2 = 1 , vice versa. Similarly, the phase operator

S = \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix}

that changes the relative phase to an imaginary value transforms between the eigenstates of X and the third Pauli operator. The two gates H and S are sometimes referred to as Clifford gates (they have the property that conjugation with them takes the set of Pauli matrices onto itself).

Similar to classical circuits, quantum gates and combinations of quantum gates are often represented graphically. A single quantum gate is represented by a box, with incoming qubits (i.e. the qubits on which the transformation acts) being indicated by incoming lines on the left and the result being represented by outgoing lines on the right.


In the example above, the first line shows two individual gates, the negation and the phase flip operator. The second line graphically represents the process of applying to a single qubit first the negation and then the phase flip. Of course, the resulting matrix is given by the matrix product

ZX = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}  = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}

Note that the order of the matrix product is reversed, as we first apply X and then Z, reading the diagram from the left to the right which is the usual convention. We also note that this matrix is sometimes called Y, whereas other authors use Y to denote -i times this matrix. We stick to the convention to use Y for the product ZX given above.

Multi-qubit gates – the CNOT and other controlled gates

Having discussed some of the fundamental quantum gates acting on one qubit, let us now turn again to multi-qubit systems. Of course, we can always create transformations on a multi-qubit system by acting on each qubit individually with a given unitary operator, i.e. by forming the tensor product of operators. Graphically, this would be represented as gates placed below each other, each acting only on one line, i.e. qubit. However, things get more interesting when we use gates that are actually combining multiple qubits to model some non-trivial logic.

As an example, let us consider a gate acting on two qubits that is commonly known as the CNOT gate. To express this gate as a matrix, we have to agree on an order of the basis of the tensor product, and is is natural to use the order |0 \rangle, |1\rangle, |2\rangle, |3\rangle for this. In this basis, the CNOT gate is given by the following 4 x 4 matrix:

CNOT = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1  \\ 0 & 0 & 1 & 0  \end{pmatrix}

To understand what this operator is actually doing, let us see how it acts on a basis. Clearly, it maps |0 \rangle = |00 \rangle and |1 \rangle = |0 1 \rangle to itself and swaps the vectors |2 \rangle = |10 \rangle and |3 \rangle = |11 \rangle . Thus, expressed in terms of bits, it negates the second bit if the first bit is ON and keeps the second bit as it is if the first bit is OFF. Thus, the first bit can be thought of as a control bit that controls whether the second bit – the target bit – is toggled or not, and this is where the name comes from, CNOT being the abbreviation for “controlled NOT”. Graphically, the CNOT gate is represented by a small circle on the control bit and an encircled plus sign on the target bit (this is the convention used in [1], whereas some other authors, for instance [2], use a slightly different symbol).


An interesting property of the CNOT gate is that it takes separable states to entangled states. Let us for instance compute the action of the CNOT gate on the state (|0 \rangle + |1 \rangle) |0 \rangle . This state can as well be written as

|00 \rangle + |10 \rangle = |0 \rangle + |2\rangle

from which we can read off that

CNOT( (|0 \rangle + |1 \rangle) |0 \rangle) = |00 \rangle + |11 \rangle

which, up to a normalization factor, is an entangled Bell state. Thus we can use the CNOT gate to create entangled states. We also see that, other than the symbol does suggest, the operator does act on the control bit as well.

The notation of a controlled NOT gate can be extended to cover more general gates. Given any unitary one-qubit transformation U, we can in fact consider the two-qubit operator that applies the identify to the second qubit if the first qubit (the control bit) is |0 \rangle and U otherwise. As a matrix, this is given by a 4 x 4 block matrix, with the identity matrix in the upper left corner and the matrix representing U in the lower right corner, and graphically, such a gate is represented as below.


To get used to the graphical notation, let us take a look at the following two-qubit circuit and try to understand what it does to the standard basis.


There are several approaches to figure out what this circuit is doing. First, we can look at its action on each of the basis vectors. Let us do this for the basis vector |00 \rangle . We read the circuit from the left to the right and the top to the bottom. Thus, the first part of the circuit acts with H on the first qubit and with the identity on the second qubit. It therefore maps the state

|00 \rangle = |0 \rangle \otimes |0 \rangle

to the state

H|0 \rangle \otimes |0 \rangle = \frac{1}{\sqrt{2}} (|0 \rangle + |1 \rangle) \otimes |0 \rangle = \frac{1}{\sqrt{2}} (|00 \rangle + |10 \rangle)

The second part of the circuit, the CNOT gate, then maps |00 \rangle to itself and |10 \rangle to |11 \rangle . Thus the entire circuit maps the vector |00 \rangle to

\frac{1}{\sqrt{2}} (|00 \rangle +|11 \rangle)

which again is an entangled Bell state. We could now proceed like this for the other basis vectors. Alternatively, we can find the answer by multiplying out the matrices and will arrive at the same result.

Universal quantum gates

In classical computing, a general circuit with n input bits and one output bit can be described by a function taking \{0, \dots, 2^n - 1 \} to \{0,1\} , and it is well known that any such function can be expressed as a combination of gates from a small set of standard gates called universal gates. In fact, it is even true that one gate is sufficient – the NAND gate. It is natural to ask whether the same is true for quantum gates. More precisely, can we find a finite set of unitary transformations such that any n-qubit unitary transformation can be composed from them using a few standard operations like the tensor product?

Unfortunately, this does not work in the quantum world. The problem is that the group of unitary matrices is not discrete, but continuous, and thus we can never hope to be able to generate it fully using only a finite number of different gates. However, something very similar is true – every gate can be approximated by combining gates from a small, universal set.

This result is typically presented and derived in two steps (see for instance [1], chapter 3). First, one shows that every unitary transformation of an n-qubit system can be reduced to a combination of single qubit operations (tensored with the identity) and CNOT gates. This is still exact, i.e. no approximation takes place (for the mathematically inclined reader, this might ring a bell – in fact, there is a relation to Cartan decompositions of unitary matrices, see for instance [3] for a discussion).

Then, in a second step, one shows that every single qubit transformation can be approximated arbitrarily close by products of a finite universal set of gates – this is known as the Solovay-Kitaev theorem. It turns out (see for instance [4]) that a possible choice for such a universal set consists of the Hadamard gate H, the phase flip gate Z and only one additional gate

\begin{pmatrix} 1 & 0 \\ 0 &  e^{i\frac{\pi}{4}} \end{pmatrix}

which is confusingly sometimes called the \pi / 8 phase gate.

Finally, we mention that a three-qubit gate that is often used to construct universal sets of quantum gates is the Toffoli gate. This is like a CNOT gate, except that it has two control bits and the target bit is only flipped if both control bits are ON. It is known that the set consisting of the Toffoli gate and the Hadamard gate only is already universal in a similar sense (see this paper by D. Aharonov for a short proof of this result that goes back to Y. Shi). This result is of a certain importance because one can also show that the classical version of the Toffoli gate alone is sufficient to implement (a reversible version of) any classical circuit. Thus we do not only see that any computation that can be done by a classical circuit can also be done by a quantum circuit, but also that without the Hadamard gate, quantum computing would not be more powerful than classical computing.

This closes our short discussion of elementary quantum gates. In the next few posts, we will start to combine quantum gates into quantum algorithms to perform actual computations. Stay tuned!


1. G. Benenti, G. Casati, G. Strini, Principles of Quantum Computation and Information, Volume 1, World Scientific
2. E. Rieffel, W. Polak, Quantum computing – a gentle introduction, MIT Press
3. N. Khaneja, S.J. Glaser, Cartan Decomposition of SU(2 n ), Constructive Controllability of Spin Systems and Universal Quantum Computing, arXiv:quant-ph/0010100v1
4. P. Boykin, T. Mor, M. Pulver, V. Roychowdhury, F. Vatan, A new universal and fault-tolerant quantum basis
5. D. Aharonov,A Simple Proof that Toffoli and Hadamard are Quantum Universal, arXiv:quant-ph/0301040

Qubits and Hilbert spaces

When you use your favorite search engine to search for information on quantum computing, the first term that will most likely jump at you is the qubit. In this post, I will try to explain what this is and how it is related to the usual framework of quantum mechanics.

Please be aware that this post is not meant to be a general introduction into quantum mechanics. I will have to assume some familiarity with the basics of quantum mechanics and the underlying mathematics (Hilbert spaces, states and measurements, operators, eigenvalues and so forth).


In classical computing, the world is binary – information is expressed in terms of bits, i.e. units of information that can take on two values only – 1 or 0, true or false, high or low and so on, depending on the actual physical representation.

In the world of quantum computing, the smallest unit of information is represented by “the smallest quantum system”. But what is the smallest quantum system? Any nontrivial quantum system can actually be in an infinite number of states, and therefore it does not make sense any more to look for systems with only two states. What we can do, however, is to look for systems that have only two degrees of freedom. Mathematically, these are quantum mechanical systems that are described by a two-dimensional Hilbert space. In a certain sense, these are the smallest nontrivial quantum systems one can conceive, and similar to the classical world, where is a bit is the smallest nontrivial unit, these systems are called qubits and the fundamental building blocks in quantum computing.

I admit that this definition is a bit abstract, to let us look at some examples. An example that is often cited but mostly of theoretical interest is the spin. Imagine we are looking at a single, isolated charged particle, say an electron. Classically, this particle is described by its current position and its current movement (speed and direction). On a quantum mechanical level, it turns out that there is an additional property that becomes important – the electron somehow behaves as if it were rotating around an internal axis (this is only a very vague analogy as an electron is a point particle and does not have a classical size, but it is a useful description). Classically, you would expect that that rotation can happen in two ways – clockwise and counterclockwise. Let us use the notation |0 \rangle and |1 \rangle for these two possibilities. In a classical setting, we could then try to encode one bit in the state of such a system, where TRUE might correspond to |1 \rangle and FALSE to |0 \rangle .

However, quantum mechanics is more complicated than this. In fact, it turns out that the correct description for all possible states of such a system is not just the two options |0 \rangle and |1 \rangle . Rather, the system can be in so called superposition of these two states. Mathematically, the two states |0 \rangle and |1 \rangle are taken to be the basis of a two-dimensional Hilbert space, and the superposition is described by a linear combination

a |0 \rangle + b |1 \rangle

with complex numbers a and b (more precisely, by the ray in the projective space of that Hilbert space).

There is no good classical analogy for such a state – trying to imagine that something is spinning “a little bit clockwise” and at the same time “a little bit counterclockwise” will make your head spin in either direction, but not take you anywhere. However, we get closer to the classical world if we conduct a measurement. In quantum mechanics, a measurable quantity is described by a hermitian operator, the possible outcomes are the eigenvalues of this operators and after the measurement, the system will be in the state described by one of the (normed) eigenvectors of the operators. In our case, a measurement might be set up in such a way that this operator has the basis \{ |0 \rangle, |1 \rangle \} as eigenvectors, with eigenvalues 0 and 1, i.e. it is given by the matrix

\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}

Of course, any two quantum systems which are described by a two-dimensional Hilbert space are equivalent, so every other system with that property could serve as a physical implementation of a qubit as well. This could for instance be a linearly polarized photon, with an eigenbasis given by the two orthogonal directions of polarization, or it could be a hydrogen atom with the ground state and the first excited state as the states |0 \rangle and |1 \rangle . From a theoretical point of view, all these systems will do, but of course there are huge differences when it comes to a physical implementation of a quantum computer that we will discuss in a later post in this series.

Multi-qubit systems and Hilbert spaces

It is nice to have a qubit, but you will probably anticipate that there is not so much we can do with a single qubit. Instead, we will need a large number of qubits that can interact. As always in quantum mechanics, the state of such a combined system is described by another Hilbert space, namely the tensor product of the single qubit Hilbert spaces.

So suppose that we are describing a single qubit by a two-dimensional Hilbert space H. A system of n identical qubits would then be described by the n-fold tensor product

H \otimes H \dots \otimes H

of H with itself. Given states |\psi_i \rangle , we can form the tensor product

| \psi_1 \rangle \otimes | \psi_2 \rangle  \dots \otimes | \psi_n \rangle

of these states. This gives us elements of the tensor product, but conversely, not every element of the tensor product can be written in this way (this is an important fact to which we will return below). It is, however, true that every state in the tensor product is a finite linear combination of product states. This is a major difference to the other common way to define a product of vector spaces, the direct product. In a direct product, the dimensions add up, in a tensor product, they multiply. Thus when we add a qubit, the dimension of our Hilbert space doubles and therefore grows exponentially with the number of qubits.

This is a good point in time to simplify our notation a bit. First, it is common to suppress the tensor product when it comes to states and to express the vector above simply as

| \psi_1 \rangle | \psi_2 \rangle  \dots | \psi_n \rangle

Going further, we can also combine everything into one bra and write

| \psi_1 \psi_2 \dots \psi_n \rangle

If we do this for the elements of a basis, we can simplify our notation even more. Let us do this for the states of a 2-qubit system first. By building tensor products of the base states |0 \rangle and |1 \rangle , we obtain four vectors

|00 \rangle, |01 \rangle, |10 \rangle  and |11 \rangle

These four states actually form a basis for the tensor product. This is true in general – all possible tensor products of the vectors of a basis are a basis of the tensor product space. Now recall that, in binary notation, the 2-bit sequences 00, 01, 10 and 11 stand for the number 0 – 3. Using this, we could also write this basis as

|0 \rangle, |1 \rangle, |2 \rangle  and |3 \rangle

For a three qubit system, we could use the same notation to arrive at the basis

|0 \rangle = |000 \rangle, |1 \rangle = |001\rangle, \dots 7 = |111 \rangle

and so forth. In general, for a combined system of n qubits, we can label our basis with the numbers 0 to 2n-1, and our tensor product will have dimension 2n, with a general element being given as a sum

|\psi \rangle = \sum_0^{2^n-1} a_i |i\rangle

Entangled states

To close this post, let us come back to one very important point that is a major ingredient to the theoretical power of quantum computing – entanglement. Suppose we are given a two-qubit system. We have seen that certain states can be obtained by taking the tensor product of single qubit states. Can every state in the tensor product be obtained in this way?

Let us try this out. If we take the tensor product of two arbitrary single qubit states a |0\rangle + b |1 \rangle and c |0\rangle + d |1 \rangle , we obtain

ac |00\rangle + ad |01 \rangle + bc |10 \rangle + bd |11 \rangle

Now let us consider the state

| \psi \rangle = \frac{1}{\sqrt{2}} ( |00\rangle + |11 \rangle )

To decompose this state as a product, we would need ad = 0 and ac = 1, which is only possible if d = 0. However, then bd = 0 as well, and we get the wrong coefficient for |11 \rangle . Thus this is an example of a state which cannot be written as a tensor product of two single qubit states (but it can be written, of course, as a sum of such states). States that can be decomposed as a product are called separable states, and states that are not separable are called entangled states.

The state above is often called a Bell state and it is worth mentioning that states like this and entanglement features in one of the most famous though experiments in the history of quantum mechanics, the so called Einstein-Podolsky-Rosen paradox or ERP paradox – a broad topic that we will not even try to cover here. We note, however, that it is entanglement that is responsible for the fact that the dimension of the state space grows exponentially with the number of qubits, and most quantum algorithms use entangled states somehow, so entanglement does seem to play a vital role in explaining why quantum computing seems to be superior to classical computing, see for instance section 13.9 of “Quantum computing – a gentle introduction” for a more in-depth discussion of the role of entanglement in quantum algorithms.

Visualizing quantum states

Before closing this post and turning to quantum gates in the next post in this series, let us quickly describe a way how to visualize a multi-qubit quantum state. Our starting point is the expression

|\psi \rangle = \sum_0^{2^n-1} a_i |i\rangle

for the most general superposition in an n-qubit system. To visualize that state, we could use a bar char as in the diagram below.


Each tick on the horizontal axis represents one of the basis states |i \rangle . In our example, we have eight basis states, corresponding to a quantum system with 3 qubits. The bar represents the amplitude, i.e. the number a_i in the superposition above (of course this is only a very incomplete representation, as it represents the amplitudes as real numbers, whereas in reality they are complex numbers). The upper state in the diagram represents a superposition for which all of the amplitudes are different from zero. The second diagram represents a state in which all but one amplitude is zero, i.e. a state that is equivalent to one of the basis vectors (|2\rangle in this case). These states correspond to the eight states of a corresponding classical system with three bits, whereas all states with more than one bar are true superpositions which have no direct equivalent in the classical case. We will later see how similar diagrams can be used to visualize the inner workings of simple quantum algorithms.

Qubits are nice, but in order to perform computations, we have to manipulate them somehow. This is done using quantum gates which I will discuss in my next post.

Quantum computing

When I started this blog, my intention was to document my own attempts to learn and understand some of the topics at the intersection of computer science, physics and mathematics that have the potential to change the world in which we work and live. After spending some time with one of these topics – machine learning and artificial intelligence – I have now started to look into one of the most exciting combinations of these subjects – quantum computing.

Roughly speaking, quantum computing refers to leveraging the laws of quantum mechanics – like superposition and entanglement – to build computing devices that operate in a way fundamentally different from traditional computers and allow us to solve traditionally hard computational problems in a reasonable amount of time.

How exactly this works is difficult to explain without referring to non-trivial quantum mechanics and mathematics. It is often said that quantum computing is exploiting the superposition principle – the fact that the state of a quantum system can be described as a superposition of many different fundamental states at the same time – to perform many calculations parallel in one step that a classical computer can only perform serially. However, we will see that this is a bit of an oversimplification as it ignores the fact that a measurement destroys this parallelism.

In addition, even though a quantum computer is theoretically able to perform any calculation that a classical computer can do, it is not clear that we will see a drastic speedup for arbitrarily chosen classically hard problems. In fact, as of today, we know a few problems – like the factorization of large composite numbers into primes – that can be solved using quantum computers while being intractable for traditional hardware, and we know of some areas of applications like simulations or optimization and machine learning, where we have good reasons to expect that quantum computing will provide a massive gain in speed and capacity, but there is strong evidence that not all classically hard problems can be solved efficiently on a quantum computer (see for instance [1]). And sometimes, we find a fast quantum algorithm for something that so far could not be done efficiently on a classical computer, but this algorithm then inspires a classical solution providing the same speedup (the recommendation problem seems to be an example for that fruitful interaction between quantum computing and classical computer science).

So even a fully operational universal quantum computer would not out of a sudden make classical computing obsolete and perform better in all known instances. Rather, we should expect that at least in the near future, the main applications of quantum computing will be restricted to a few dedicated fields, with classical computing evolving in parallel (see for instance [2] for a discussion).

This does not make quantum computing less exciting. After all, quantum computing changes the way how we think about things like information and algorithms substantially. However, implementing a working quantum computer remains hard. In later posts, we will look at some of the technologies like quantzum error correction, superconducting devices and ion traps that are currently used by researchers around the globe to build quantum computing devices at scale. We have seen some very exciting progress recently, with e.g. Google announcing its new Bristlecone device with 72 qubits or IBM making a 5 qubit quantum computer publicly accessible in the cloud as part of their Q Experience program. There are even simulators like Quirk that allow you to put together quantum circuits online and play with the results or frameworks like Microsofts quantum development kit or the API and framework developed by 1QBit that support the development and debugging of quantum algorithms before trying them on a real quantum device. So it is definitely a good point in time to get to know some of the basics of quantum computing, covering theory, applications and implementation – and this is the goal of this new series.

Currently, my plans are to cover the following topics in the upcoming posts.