# Chapter 2 Quantum computing and quantum algorithms

In this chapter we will introduce the preliminaries needed for in this book. We will extensively use linear algebra (norm of matrices, SVD, properties of particular matrices, and so on), so the reader is highly encouraged to refer to the appendix regarding the notations adopted.

## 2.1 Getting rid of physics in quantum computing

Following one of the lectures of
Susskind, we are going to
start from a “handwavy” introduction of quantum mechanics, that starts
from few considerations and lead straight to the Schrödinger equation.
With a few mathematical tricks, we are going to justify the 4 axioms of quantum mechanics stated in the next sections intuitively. The hope is that the
reader can be gently guided from a tiny physical intuition to a greater
understanding of the **axioms of quantum mechanics**. While “axioms of quantum mechanics” implies that physics is involved, thanks to some of their formulation (see for instance in (Nielsen and Chuang 2002), which we adopt), we can ignore the physics and think of them as “axioms of quantum computing”. As Scott
Aaronson rightly said, if
you are not a physicist, you need to remove physics from quantum
mechanics to understanding it!

The objective is that the reader should *not* feel the need to dig into
quantum mechanical details of quantum computers, but can start solely from the 4 axioms of quantum mechanics and build (a lot) from there.

At the beginning of the 20th century, when physicists started to model quantum phenomena, they observed that the time and space evolution of quantum systems had two properties: it was continuous and reversible (as in classical mechanics). They decided to formalize these properties as follows. First, they decided to model the state of a quantum system at time \(p\) as a function \(\psi(p)\), and they decided to model the evolution of \(\psi(p)\) for time \(t\) as an operator \(U(t)\) acting on \(\psi(p)\)s. Let \(I\) be the identity operator. Formally, the two requirements can be written as:

- \(U(\epsilon)=I-i\epsilon H\) (continuity)
- \(U^\dagger(t)U(t)=I\) (reversibility)

The first requirement reads that if we were to apply an evolution for a small amount of time \(\epsilon\), then \(U\) would behave almost as the identity \(I\), differing by a small amount of another operator \(H\). The second requirement reads that the operation \(U\) can be undone by applying a conjugate transpose of \(U\). By doing this we obtain the identity operator. This means that we return to the initial state of the system before it evolved. From these two requirements, we can observe that:

\[I = U^\dagger(\epsilon)U(\epsilon)= (I+i\epsilon H)(I-i\epsilon H) = I -i \epsilon H + i \epsilon H^\dagger + O(\epsilon^2).\] The only way for this equation to hold is for \(H=H^\dagger\), i.e. the operator \(H\) should be equal to its transpose conjugate. In mathematics, we call them Hermitian operators! (More about these in the appendix!). Now we can ask ourselves what happens when we apply \(U(\epsilon)\) to a quantum state \(\psi(t)\)? Well it’s simple to see now: \[U(\epsilon)\psi(t) = \psi(t+\epsilon) = \psi(t) -i \epsilon H\psi(t).\] With a little algebra we can rewrite the previous equation as:

\[ \frac{\psi(t+\epsilon) - \psi(t)}{\epsilon} = -i H\psi(t).\] Note that the left-hand side part of this equation can be rewritten, under the limit that \(\epsilon \mapsto 0\) as a derivative: \[\frac{d}{dt}\psi(t) = -i H \psi(t).\] But this is the well-known Schrödinger equation! Note that, as computer scientists, we take the right to remove some physical constant (\(\hbar\)) out of the equation. What should be the takeaway of these observations? Well, first we know that the Schrödinger equation is a differential equation whose solution is fully determined if we were to know the initial state of our system \(\psi(p)\). Formally the solution can be written as:

\[\psi(p+t)=e^{-iHt}\psi(p).\] From this last equation we can observe
further (more on this in the appendix) that the exponential of an
Hermitian matrix \(e^{-iHt}\) is *defined* through its Taylor expansion is
just a *unitary* matrix: \(U(t)=e^{-itH}\). Unitary matrices are exactly
those matrices that describe isometries: applying a unitary matrix to a
vector won’t change its length. From this, we see that the two quantum
states \(\psi(p+t)\) and \(\psi(p)\) could be taken just to be vectors of a
fixed length, which - for practicality - we take to be unit vectors.
Notation-wise, we denote unit vectors describing quantum states as
“kets”, i.e. we rewrite this equation as:

\[|\psi(p+t)\rangle=U(t)|\psi(p)\rangle\] Hopefully, this introduction should be enough for getting a better intuition of what comes next, and give you a “justification” for the axioms of quantum mechanics.

## 2.2 Axioms of quantum mechanics

The standard formalism used in quantum information is the Dirac’s “bra-ket” notation, which we will introduce in this section. We also recall here the postulates of quantum mechanics, and take this opportunity to settle the rest of the notation and preliminaries used in this book. For the postulates, we follow the standard formulation in (Nielsen and Chuang 2002).

**Proposition 2.1 (Postulate 1) **Associated to any isolated physical system is a complex vector space with inner product (that is, a Hilbert space) known as the state space of the system. The system is completely described by its state vector, which is a unit vector in the system’s state space.

As quantum states are described by unit vectors, we write \(|\psi\rangle\) for a unit vector \(\psi \in \mathcal{H}^d\), where \(\mathcal{H}^d\) is a \(d\) dimensional Hilbert space. So for a non-normalized vector \(x \in \mathbb{R}^d\), the normalized quantum state is represented as \(|x\rangle = \left\lVert x\right\rVert^{-1}x = \frac{1}{\left\lVert x\right\rVert}\sum_{i=0}^n x_i|i\rangle\), where \(\left\lVert x\right\rVert\) is the \(\ell_2\) norm of the vector \(x\).. We denote as \(\{|i\rangle\}_{i\in[d]}\) the canonical (also called computational) basis for the Hilbert space. The transpose-conjugate of \(|x\rangle\) is defined as \(\langle x|\). We can think of \(|x\rangle\) as a column vector, while \(\langle x|\) as a row vector, whose entries have been conjugated. In Dirac’s notation, we denote the inner product between two vector as \(\langle x|y\rangle\). Their outer product is denoted as \(|x\rangle\langle y| = \sum_{i,j \in [d]}x_i y_j |i\rangle\langle j| \in \mathcal{H}^d\otimes \mathcal{H}^d\). The smallest (non-trivial) quantum system is called a qubit, and is a 2 dimensional unit vector in \(\mathbb{C}^2\). A base for this vector space in quantum notation is denoted as \(|0\rangle\) and \(|1\rangle\). In this case, the vector \(|\varphi\rangle = \alpha|0\rangle+\beta|1\rangle\) for \(\alpha,\beta\in \mathbb{C}\) represent a valid quantum state as long as \(|\alpha|^2 + |\beta|^2 = 1\).

**Proposition 2.2 (Postulate 2) **The evolution of a closed quantum system is described by a unitary transformation. That is, the state \(|\psi\rangle\) of the system at time \(t_1\) is related to the state \(|\psi\rangle\) of the system at time \(t_2\) by a unitary operator \(U\) which depends only on the times \(t_1\) and \(t_2\).

A matrix \(U\in \mathbb{C}^{d \times d}\) is said to be unitary if \(UU^\dagger = U^\dagger U = I\), that is, if the inverse of \(U\) equal to its conjugate transpose. From this fact it follows that unitary matrices are norm-preserving, and thus can be used as suitable mathematical description of a pure quantum evolution. It is a standard exercise to see that the following are all equivalent definition of unitary matrices (de Wolf 2019):

- \(\langle Av, Aw\rangle = \langle v,w\rangle\) for all \(v,w\).
- \(\left\lVert Av\right\rVert = \left\lVert v\right\rVert\) for all \(v\)
- \(\left\lVert Av\right\rVert = 1\) if and only if \(\left\lVert v\right\rVert=1\).
- \(A\) is a normal matrix with eigenvalues lying on the unit circle
- The columns and the rows of \(A\) form an orthonormal basis of \(\mathcal{C}^d\)
- \(A\) can be written as \(e^{iH}\) for some Hermitian operator \(H\).

**Example 2.1 (Determinant=1 is a necessary but not sufficient condition for being unitary) **It is simple to see that any \(2\times 2\) diagonal matrix \(A\) with entries \(10\) and \(1/10\) has determinant is \(1\), but it is not a unitary matrix.

It will be useful to recall that if we have a unitary that performs the
mapping \(|a_i\rangle \mapsto |b_i\rangle\), we can have the “matrix” form of
the operator as \(\sum_i |b_i\rangle\langle a_i|\). Recall also that the Pauli
matrices are both unitary *and* Hermitian, and this fact will be useful
in many places throughout this text.

**Exercise 2.1 (From (Huang, Bharti, and Rebentrost 2019)) **Let \(k \in \{0,1\}^n\) be an arbitrary \(n\)-bitstring.
Let \(A=(\sigma_{x}^{(1)})^{k_1} \otimes \dots \otimes (\sigma_{x}^{(n)})^{k_n}\) and \(|b\rangle=|0^n\rangle\).
What is the solution to the equation \(A|x\rangle =|b\rangle\)

**Proposition 2.3 (Postulate 3) **Quantum measurements are described by a collection \(\{M_m\}\) of measurement operators. These are operators acting on the state space of the system being measured. The index \(m\) refers to the measurement outcomes that may occur in the experiment. If the state of the quantum system is
\(|\psi\rangle\) immediately before the measurement, then the probability that the result \(m\) occurs is given by
\[
p(m) = \langle\psi|M^\dagger_m M_m |\psi\rangle
\]
and the state of the system after the measurement is
\[
\frac{M_m|\psi\rangle}{\sqrt{\langle\psi|M_m^\dagger M_m|\psi\rangle}}
\]
The measurement operators satisfy the
\[
\sum_m M^\dagger _m M_m = I
\]

In practice, we will mostly perform projective measurements (also called
von Neumann measurements). A projective measurement is described by an
*observable*: an Hermitian operator \(M\) on the state space of the system
being observed. The observable has a spectral decomposition:
\[ M = \sum_m mP_m \] Where \(P_m\) is a projector into the eigenspace of
\(M\) associated with the eigenvalue \(m\). This means that the measurement
operator will satisfy the following properties:

- \(P_m\) is positive definite
- \(P_m\) is Hermitian
- \(\sum_m P_m = I\)
- \((P_m)(P_n) = \delta_{mn}(P_m)\) are orthogonal projections.

Recall that an orthogonal projector \(P\) has the properties that \(P=P^{\dagger}\) and \(P^2 = P\). Note that the second property derives from the first: all positive definite operators on \(\mathbb{C}\) are Hermitian (this is not always the case for positive definite operators on \(\mathbb{R}\), as it is simple to find positive definite matrices that are not symmetric). Projective measurements can be understood as a special case of Postulate 3: in addition to satisfying the completeness relation \(\sum_m M_m^\dagger M_m = I\) they also are orthogonal projectors. Given a state \(|\psi\rangle\), the probability of measuring outcome \(m\) is given by:

\[\begin{equation} p(m) = \langle\psi|P_m|\psi\rangle. \tag{2.1} \end{equation}\]

If we were to measure outcome \(m\), then the state of the quantum system after the measurement would be: \[\frac{P_m|\psi\rangle}{\sqrt{p(m)}}.\]

They have some useful properties. Just to cite one, the average value of a projective measurement in a state \(|\psi\rangle\) is defined as: \[\begin{align} E(M)& = \sum_m p(m)\\ & = \sum_m m \langle\psi|P_m|\psi\rangle\\ & \langle\psi|(\sum_m mP_m)|\psi\rangle\\ & \langle\psi|M|\psi\rangle \end{align}\] In practice, our projective operators will be projectors in the computational basis, i.e. \(P_m = \sum_{m \in [d]} |m\rangle\langle m|\). From these rules, it is simple to see that the probability that a measurement on a state \(|x\rangle = \frac{1}{\left\lVert x\right\rVert}\sum_i x_i|i\rangle\) gives outcome \(i\) is \(|x_i|^2/\left\lVert x\right\rVert^2\).

**Proposition 2.4 (Postulate 4) **The state space of a composite physical system is the tensor product of the state spaces of the component physical systems. Moreover, if we have systems numbered from 1 through \(n\), and each state is described as \(|\psi_i\rangle\), the join state of the total system is \(\bigotimes_{j=1}^n |\psi_i\rangle=|\psi_1\rangle|\psi_2\rangle\dots |\psi_n\rangle\).

To describe together two different quantum system we use the tensor product. The tensor product between two vectors \(|y\rangle \in \mathbb{R}^{d_1}\) and \(|y\rangle \in \mathbb{R}^{d_2}\) is a vector \(|z\rangle \in \mathbb{R}^{d_1 \times d_2}\). We can use the tensor operation to describe the joint evolution of separate quantum system.

Even if it is not explicitly used much in quantum algorithms, it is useful to recall the definition of entangled pure state.

**Definition 2.1 (Entangled state) **A quantum state that cannot be expressed as a tensor product of two quantum state is said to be entangled.

The same thing can be done for operators. Let \(U_1\) be the unitary describing the evolution of a quantum state \(|x\rangle\) and \(U_2\) the unitary describing the evolution of a quantum state \(|y\rangle\). Then \(U_1 \otimes U_2\) describes the evolution of the quantum system \(|x\rangle\otimes |y\rangle\).

From the definition of tensor product, we see that the tensor product of \(n\) qubits (i.e.\(n\) different \(2\)-dimensional vectors) is a vector of size \(2^n\). Thus, to build a quantum state that stores \(d\) numbers in its amplitudes we need \(\lceil \log d\rceil\) qubits. More precisely, observe that for a vector \(\vec{v} \in \mathcal{H}^d\) if we want to build \(|\vec{v]}\rangle = \|\vec{v}\| \vec{v}\) we need only \(\lceil \log d \rceil\) numbers. If \(d\) is not a power of \(2\), we consider the vector padded with \(0\) to the nearest power of \(2\) bigger than \(d\). This will allow us to build states in the form of \(\frac{1}{\|\vec{v}\|}\sum_{i=0}^{\lceil \log d \rceil} v_i |i\rangle\). This fact will used a lot in our quantum algorithms, especially in quantum machine learning. We will discuss in chapter @ref(#chap-classical-data-quantum-computers) how to create these kind of quantum states.

### 2.2.1 Review of important statements in quantum computation

Before delving into a review of quantum algorithms, we would like to state here a few important lemmas.

**Lemma 2.1 (Hadamard on a bitstring (Nielsen and Chuang 2002)) **Let \(x \in \{0,1\}^n\) be a bitstring, and \(H\) be an Hadamard gate. Then:

\[\begin{equation} H^{\otimes n}|x\rangle = \frac{1}{\sqrt{2^n}}\sum_{z_1, \dots z_n \in \{0,1\}^n} (-1)^{x_1z_1 + x_2z_2 + \dots + x_nz_n} |z_1, \dots, z_n\rangle = \frac{1}{\sqrt{2^n}}\sum_{z\in \{0,1\}^n} (-1)^{x^Tz} |z\rangle, \end{equation}\]

where \(x^Tz\) is the bitwise inner product of strings in \(Z_2^n\) modulo \(2\).

## 2.3 Measuring complexity of quantum algorithms

This section is an attempt to organize in a coherent way some fundamental concepts in quantum computer science. Some refereces for this section are (Dörn 2008), (de Wolf 2019), and (Kuperberg 2011). In this section we assume the read is comfortable with the big O notation.

Intuitively, the time complexity of an algorithm is the number of operations the computer has to perform from the beginning to the end of the execution of the algorithm. This model is easy to understand when we think about a personal computer with one CPU (more precisely, a von Neumann architecture). What is the right way of measuring the complexity of (quantum) circuits? For example, suppose to have a quantum circuit of \(n\) qubits, and for each qubit we apply one Hadamard gate: what is the time complexity for running this circuit? Is it \(O(n)\) or \(O(1)\)?. We will discuss here different ways for measuring the complexity of a quantum algorithm: the number of gates, the depth of the circuit, and the number of queries to an oracle.
The first two are a direct generalization of the measure of complexity of boolean circuits, where we measure the number of gates (serial time) or the depth of the circuit (parallel time). The third measure of complexity is the query complexity, which is the number of times we use (or query) an oracle: a circuit or a data structure that we can use as a black box. In this section we will define more formally the query complexity of a quantum algorithm, and in the next chapter we will explain possible implementations of different kinds of oracles. The *time* complexity, which is ultimately the most relevant measure of complexity of an algorithm for most practical purposes, is more tricky to define properly. If we assume that our hardware is capable of executing all the gates at the same depth in parallel, than the measure of time complexity becomes the depth of the quantum algorithm. We denote with \(T(U)\) as the time complexity needed to implement a quantum circuit implementing a unitary \(U\), and this is measured in terms of **number of gates**, i.e. the size of the circuit (this is quite standard see for example the section the introduction in (Andris Ambainis et al. 2022)). This is a concept that bears some similarity with the clock rate of classical CPUs. If our hardware is not capable of executing all the gates at the same depth in one clock, we might have to consider the time complexity as the number of gates (eventually divided by the number of gates that can be executed at the same time). As an important exception to this rule we have the query complexity, where we consider the cost of a single query of an oracle as \(O(1)\), i.e. as any other gate. This is somehow justified because for certain oracles we have efficient (i.e. with depth polylogarithmic in the size of the oracle) implementations. For this, the (often non said) assumption is that the part of the quantum computer dedicated to run the main part of the circuit has to be evaluated with serial time (i.e. the number of gates), while the part of the circuit dedicated to executing the oracle has to be evaluated for its parallel time complexity. An example of this, which we will treat in more details in the next chapter is the QRAM and the QRAG gate. This assumption allows to justifiably conflate the query complexity of an algorithm as its time complexity.

Last but not least, everything is made even more complicated by error correction. In fact, in some architecture and some error correction codes, the time complexity of the algorithm is well approximated by the number of \(T\) gates of the circuit.

In this book, we use the standard notation of \(\widetilde{O}\) to hide the polylogarithmic factors in the big-O notation of the algorithms: \(O(n\log(n))=\widetilde{O}(n)\). The meticulous reader may notice that in this way we might make the notation ambiguous, as it might not be clear what is hidden in the \(\widetilde{O}\) notation. The correct way of using this notation is to hide only factors that appear with a dependency bigger than the logarithm (i.e. \(O(n\log(n))=\widetilde{O}(n)\) but \(O(n\log(n)\log(1/\epsilon))=\widetilde{O}(n\log(1/\epsilon))\)). Another possibility, to make our runtimes more precise, is to use a subscript to specify the factors that we choose to hide: \[\begin{equation} \widetilde{O}_{\epsilon,\delta}\left(\frac{1}{\epsilon}\right) = O\left(\frac{1}{\epsilon}\log(1/\epsilon)\log(1/\delta)\right) \end{equation}\]

**Definition 2.2 (Quantum query or oracle access to a function) **Let \(\mathcal{H}\) be a finite-dimensional Hilbert space with basis \(\{0,1\}^{n}\). Given \(f:\{0,1\}^n\to\{0,1\}^m\), we say that we have quantum query access to \(f\) if we have access to a unitary operator \(U_f\) on \(\mathcal{H}\otimes\mathbb{C}^{2^m}\) such that \(U|x\rangle|b\rangle = |x\rangle|b \oplus f(x)\rangle\) for any bit string \(b\in\{0,1\}^m\). One application of \(U_f\) costs \(T_f\) operations.

**Definition 2.3 (Quantum computation in the query model) **Let \(O_x\) be a unitary operator that encodes the input of our computation, and acts in a non-trivial way on its associated Hilbert space.
A quantum computation with \(T\) queries to an oracle \(O_x : |i,b,z\rangle \mapsto |i,b \oplus x_i, z\rangle\) is a sequence of unitary transformations:

\[U_TO_xU_{T-1}O_x \dots U_1O_xU_0\]

Where each \(U_t\) is a unitary that does not depend on the input of the algorithm. The output of the computation is obtained by measuring the rightmost register (by convention).

Note that the second register holds the XOR of the \(i\)-th component of the input with the previous state of the register (i.e. the b). This is to make the computation reversible. Importantly, the definition 2.2 is just an example of function for which we can have query access. We can assume query access to unitaries creating various kind of quantum states as output. We will see many examples of oracles as definition 3.3, 3.8, 3.9, and 3.10.

This is the so-called query model, or oracle model of (quantum) computation. An important thing here is the last statement of Definition 2.2, on the cost of applying \(U_f\), which is \(O(1)\). There are multiple reasons for working in this model, which might appear unrealistic at this time. First, it is often the case that queries to these oracles are actually efficient (as we will see in many example), so the query complexity is actually equivalent (up to multiplicative polylogarithmic factors) to the depth of the quantum circuit that is going to be executed. Another reason is that in the oracle model is relatively simple to prove lower bounds and results about the complexity of an algorithm in terms of the number of queries to an oracle that encodes the input of the problem. It is customary, for complex results in quantum algorithms to separate the study of the query complexity of the problem and the depth or gate complexity of the quantum circuit which is executed on the real hardware. We formalize more this difference in the following definition.

**Definition 2.4 (Query complexity of an algorithm) **The quantum query complexity of a quantum algorithm \(\mathcal{A}\) is the number of queries to a black-box made by \(\mathcal{A}\) in order to compute \(f\).

If we just care about the **relativized** complexity, we might limit
ourselves to compare two algorithms that solve the same problem in terms
of the number of queries to a given oracle, we might observe that one is
faster than the other. This is a **relativized** speedup. The oppositive
is an **absolute** speedup, i.e. when we also take into account the
complexity of the operations that are **not** queries to an oracle. In
the case of quantum algorithms, these might simply be the gate depth of
the circuit.

**Definition 2.5 (Circuit complexity or time complexity) **The quantum circuit complexity (or time complexity) of a quantum algorithm \(\mathcal{A}\) is the depth of the quantum circuit implementing \(\mathcal{A}\).

Quantum computing is not the only place where we measure the complexity in terms of query to an oracle. In fact, it’s sufficient to do a few “queries” (pun intended) on your search engine to realize that in many computational models we have adopted this measure of computational complexity.

**Note that the query complexity of an algorithm is a lower bound on the
gate complexity of the quantum circuit.** It is often simpler to study
first the query complexity of a quantum algorithm and then study the
time complexity. For most quantum algorithms (but not all!) the time
complexity coincides with the query complexity, up to a logarithmic
factor. Note that, if we find a way to have an oracle whose depth (i.e.
circuit complexity) is only (poly)logarithmic in the input size, then
the query complexity and the gate complexity coincide up to a negligible
polylogarithmic factor. There are some exceptions. Most notably, there
is a quantum algorithm for the important *hidden subgroup problem* with
only polynomial query complexity, while the classical counterpart has a
query complexity that is exponential in the input size. Nevertheless,
the overall time complexity of the quantum algorithm is (to date) still
exponential, and polynomial-time quantum algorithms are known only for a
few specializations of the problem.

We will clarify better some definitions that are used to describe the probabilistic behavior of an algorithm:

**Definition 2.6 (Randomized algorithms) **Let \(f : \{0,1\}^N \mapsto \{0,1\}\) be a Boolean function. An algorithm computes \(f\):

**exactly**if the outputs equals \(f(x)\) with probability 1 for all \(x \in\{0,1\}^N\)- with
**zero error**if it is allowed to give the answer “UNDEFINED” with probability smaller than \(1/2\) (but if the output is \(0\) or \(1\) it must be correct) - with
**bounded error**if the output equals \(f(x)\) with probability greater than \(2/3\) for all \(x\in \{0,1\}^N\).

A bounded error (quantum or classical) algorithm that fails with
probability \(1/3\) (or any other constant smaller than \(1/2\)) is meant to
fail *in the worst-case*. We do not expect the algorithm to fail in the
average case, i.e. for most of the inputs (see Appendix of (de Wolf 2019)).

If a (quantum or classical) algorithm is said to output the right answer
in **expected** (oftain said “in expectation”) running time \(T\), we can
quickly create another algorithm that has **worst-case** guarantees on
the runtime. This is obtained using the Markov’s inequality, i.e.
theorem C.2 as follows. Run the algorithm for \(kT\) steps,
i.e.. stop the execution after \(kT\) steps if it hasn’t terminated
already. If \(X\) is the random variable of the runtime of the computation
(so \(\mathbb{E}[X]=T\)), then:

\[Pr\left[X > kT \right] \leq \frac{1}{k} \] So with probability \(\geq 1-\frac{1}{k}\) we will have the output of the algorithm.

## 2.4 Review of famous quantum algorithms

In this Chapter we will explore some introductory quantum algorithms. While some of them are not directly related to data analysis nor machine learning, it is important to report them here because they help us better understand the model of quantum computation we adopt. Others will prove to be really useful for the quantum machine learning practitioner.

### 2.4.1 Deutsch-Josza

**Definition 2.7 (Constant function) **A function \(f :\{0,1\}^n \mapsto \{0,1\}\) is constant if \(f(x)=0 \forall x \in \{0,1\}^n\) or \(f(x)=1 \forall x \in \{0,1\}^n\).

**Definition 2.8 (Balanced function) **A function \(f :\{0,1\}^n \mapsto \{0,1\}\) is balanced if \(f(x)=0\) for half of the inputs and \(f(x)=1\) for the other half.

**Theorem 2.1 (Deutsch-Josza (Deutsch and Jozsa 1992)) **Assume to have quantum access (as definition 2.2 ) to a unitary \(U_f\) that computes the function \(f :\{0,1\}^n \mapsto \{0,1\}\), which we are promised to be either constant or balanced. There is a quantum algorithm that decides which is the case with probabiliy \(1\), using \(U_f\) only once and using \(O(\log(n))\) other gates.

*Proof*. We start our quantum computer initializing \(n\) qubit as \(|0\rangle\) state follwed by a single ancilla qubit initialized in state \(|1\rangle\), which we will use for the phase-kickback. Then, we apply the Hadamard transform on each of them. Mathematically, we are performing the following mapping:

\[\begin{equation} |0\rangle^{\otimes n}|1\rangle \mapsto \left(\frac{1}{\sqrt{2^n}}\sum_{x\in\{0,1\}^n} |x\rangle \right)|-\rangle \end{equation}\] Now we apply \(U_f\) using the first register (i.e. the first \(n\) qubits) as input and the ancilla register (the last qubit) as output. Our quantum computer is now in the state \[\left(\frac{1}{\sqrt{2^n}}\sum_{x\in\{0,1\}^n}(-1)^{f(x)}|x\rangle \right)|-\rangle\] Now we apply \(n\) Hadamard gates to the \(n\) qubits in the first registers. Recalling lemma 2.1, this gives the state

\[\left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}(-1)^{f(x)} \sum_{j \in \{0,1\}^n }(-1)^{xj} |j\rangle \right) |-\rangle = \left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}\sum_{j \in \{0,1\}^n} (-1)^{f(x)+ xj} |j\rangle \right)|-\rangle\]

In this state, note that the normalization factor has changed from \(\frac{1}{\sqrt{2^n}}\) to \(\frac{1}{2^n}\), and recall that \((-1)^{xj}\) is read as \((-1)^{ \sum_{p} x_pj_p \text{mod}2 }\). The key idea of the proof of this algorithm lies in asking the right question to the previous state: what is the probability of measuring the state \(|0\rangle^n\) in the first register? The answer to this question will conclude the proof of this theorem. Before looking at the probability, observe that the amplitude of the state \(|j=0\rangle\) we will see that it is just \(\frac{1}{2^n}\sum_{x}(-1)^{f(x)}\), as \(x^Tj=0\) if \(j=0_1\dots 0_n\), for all \(x\). Then,

\[\begin{equation} \frac{1}{2^n} \sum_{i \in \{0,1\}^n } (-1)^f(x) = \begin{cases} 1 & \text{if } f(x)=0 \forall x \\ -1 & \text{if } f(x)=1 \forall x \\ 0 & \text{if } f(x) \text{is balanced} \end{cases} \end{equation}\]

To conclude, reckon that if the function \(f\) is constant (first two cases), we will measure \(|0\rangle^{\otimes n}\) with probability \(1\), and if the function is balanced, we will measure some bitstring of \(n\) bits that is different than the string \(0_1\dots 0_n\).

It’s simple to see that if we want to solve this problem with a
classical *deterministic* algorithm, we need exactly \(2^n/2 + 1\)
queries. However, with the usage of a randomized algorithm we can
drastically reduce the number of queries by admitting a small
probability of failure.

**Exercise 2.2 **Can you think of an efficient randomized classical algorithm for solving this problem? Perhaps you can use the tools in the Appendix for randomized algorithms.

We now turn our attention to the first learning problem of this book. This is rarely stressed that the following algorithm can be interpreted as a learning algorithm.

### 2.4.2 Bernstein-Vazirani

**Theorem 2.2 (Bernstein-Vazirani) **Assume to have quantum access (as definition 2.2 ) to a unitary \(U_f\) that computes the function \(f :\{0,1\}^n \mapsto \{0,1\}\), which computes \(f_a(x) = (x,a) = ( \sum_i^n x_i a_i )\mod 2\) for a secret string \(a \in \{0,1\}^n\). There is a quantum algorithm that learns \(a\) with probability \(1\), using \(U_f\) only once and \(O(\log(n))\) other gates.

*Proof*. The algorithm follows exactly the same steps as the Deutsch-Josza algorithm. The proof is slightly different, and start by noting that, after the application of the oracle \(U_f\), the register of our quantum computer is in the following state:

\[\begin{equation} \left(\frac{1}{\sqrt{2^n}} \sum_{x\in\{0,1\}^n}(-1)^{f(x)} |x\rangle \right) |-\rangle = \left(\frac{1}{\sqrt{2^n}} \sum_{x\in\{0,1\}^n} (-1)^{a^T x}|x\rangle \right)|-\rangle \end{equation}\]

Now we resort again to Lemma 2.1, and we use the fact that the Hadamard it is also a self-adjoint operator (i.e. it is the inverse of itself: \(H^2 = I\)). Thus applying \(n\) Hadamard gates to the first register leads to the state \(|a\rangle\) deterministically.

**Exercise 2.3 **Can you think of an efficient randomized classical algorithm for solving Berstain-Vazirani problem? You can use the tools in the Appendix for randomized algorithms.

Other material for learning about Deutsch-Josza and Bernstein-Vazirani algorithms are the lecture notes of Ronald de Wolf that you can find here.

### 2.4.3 Hadamard test

Let \(U\) be a unitary acting on \(n\) qubits, and \(|\psi\rangle\) a quantum state on \(n\) qubit (generated by another unitary \(V\)). We also require to be able to apply the controlled version of the unitary \(U\). Then, the Hadamard test is a quantum circuit that we can use to estimate the value of \(\langle\psi| U | \psi\rangle\). The circuit is very simple, it consists in a Hadamard gate applied on an ancilla qubit, the controlled application of the unitary \(U\) and another Hadamard gate.

The initial operation leads to \((H\otimes V) |0\rangle|0\rangle = |+\rangle|\psi\rangle\), then we have: \[ \begin{aligned} |\psi_{\text{final}}\rangle & =(H\otimes I)(cU )|+\rangle|\psi\rangle = (H\otimes I)\frac{1}{\sqrt{2}}\left(|0\rangle|\psi\rangle+|1\rangle U|\psi\rangle \right) \\ & =\frac{1}{2}\left(|0\rangle\left(|\psi\rangle + U|\psi\rangle \right) + |1\rangle\left(|\psi\rangle - U|\psi\rangle \right) \right) \end{aligned} \]

Note that the last state could be written equivalently, by just factoring out the \(|\psi\rangle\) state as \(|\psi_{\text{final}}\rangle=\frac{1}{2}\left(|0\rangle(I+U)|\psi\rangle + |1\rangle(I-U)|\psi\rangle \right)\). The probability of measuring \(0\) in the first qubit is:

\[\begin{align} p(0)= & \left\lVert\frac{1}{2}(I+U)|\psi\rangle \right\rVert_2^2 = \frac{1}{4} \left(\langle\psi| + \langle\psi|U^\dagger \right)\left(|\psi\rangle + U|\psi\rangle \right) \\ =& \frac{2+\langle\psi(U+U^\dagger)\psi\rangle}{4} = \frac{2+2\text{Re}(\langle\psi|U|\psi\rangle)}{4} \end{align}\]

Where we used Postulate 2.3 with the observable \(|0\rangle\langle 0|\otimes I\). The probability of measuring \(1\) in the first register follows trivially.

**Exercise 2.4 **Can you tell what is the expected value of the observable \(Z\) of the ancilla qubit? Remember that the possible outcome of the observable \(Z\) are \(\{+1, -1\}\).

However, we might be interested in the imaginary part of \(\langle\psi|U|\psi\rangle\). To estimate that, we need to slightly change the circuit. After the first Hadamard gate, we apply on the ancilla qubit a phase gate \(S\), which gives to the state \(|1\rangle\) a phase of \(-i\). To get the intuition behind this, let’s recall that the imaginary part of a complex number \(z=(a+ib)\) is defined as: \(\text{Im}(z)= \frac{z-z^\ast}{2i}=\frac{i(z-z^\ast)}{-2}= \frac{-2b}{-2} =b\), where after the definition, we just complicated the series of equations by multiplying the numerator and denominator by \(i\), a trick that we will use later. The rest of the circuit of the Hadamard test stays the same. The evolution of our state in the quantum computer is the following:

\[\begin{align} |\psi_{\text{final}}\rangle& =(H\otimes I)(cU )\left( |0\rangle - i|1\rangle\right)|\psi\rangle = (H\otimes I)\frac{1}{\sqrt{2}}\left(|0\rangle|\psi\rangle-i|1\rangle U|\psi\rangle \right) \\ & = \frac{1}{2}\left(|0\rangle\left(|\psi\rangle -iU|\psi\rangle \right) + |1\rangle\left(|\psi\rangle + i U|\psi\rangle \right) \right) \end{align}\]

The probability of measuring \(0\) is given by the following equation.

\[\begin{align} p(0)=\frac{1}{4}\left(\langle\psi|+iU\langle\psi| \right)\left(|\psi\rangle-iU|\psi\rangle \right) = \frac{1}{4} \left(2 - i\langle\psi|U|\psi\rangle + i \langle\psi|U^\dagger|\psi\rangle \right) \end{align}\]

Note that when taking the conjugate of our state, we changed the sign of \(i\). We now have only to convince ourselves that \(-i\langle\psi|U|\psi\rangle + i \langle\psi|U^\dagger|\psi\rangle = i\langle\psi|U^\dagger -U|\psi\rangle\) is indeed the real number corresponding to \(2\text{Im}(\langle\psi| U|\psi\rangle)\), and thus the whole equation can be a probability.

**Exercise 2.5 **Can you check if the \(S\) gate that we do after the first Hadamard can be performed before the last Hadamard gate instead?

### 2.4.4 Modified Hadamard test

In this section we complicate a little the results obtained in the previous one, by finding the number of samples that we need to draw out of a circuit in order to estimate the expected value or the probability of interested with a certain level of accuracy and with a certain probability.

**Theorem 2.3 (Modified Hadamard test (no amplitude amplification)) **Assume to have access to a unitary \(U_1\) that produces a state \(U_1 |0\rangle = |\psi_1\rangle\) and a unitary \(U_2\) that produces a state \(|\psi_2\rangle\), where \(|\psi_1\rangle,|\psi_2\rangle \in \mathbb{C}^N\) for \(N=2^n, n\in\mathbb{N}\). There is a quantum algorithm that allows to estimate the quantity \(\langle\psi_1|\psi_2\rangle\) with additive precision \(\epsilon\) using controlled applications of \(U_1\) and \(U_2\) \(O(\frac{\log(1/\delta)}{\epsilon^2})\) times, with probability \(1-\delta\)

*Proof*. Create a state \(|0\rangle|0\rangle\) where the first register is just an ancilla qubit, and the second register has \(n\) qubits. Then, apply an Hadamard gate to the first qubit, so to obtain \(|+\rangle|0\rangle\). Then, controlled on the first register being \(0\), we apply the unitary \(U_1\), and controlled on the register being \(1\), we apply the unitary \(U_2\). Then, we apply again the Hadamard gate on the ancilla qubit. The state that we obtain is the following:

\[\begin{align} (H\otimes I ) \frac{1}{\sqrt{2}}\left( |0\rangle|\psi_1\rangle + |1\rangle|\psi_2\rangle \right) \\ = \frac{1}{2}\left(|0\rangle(|\psi_1\rangle + |\psi_2\rangle) + |1\rangle(|\psi_1\rangle - |\psi_2\rangle) \right) \end{align}\]

Again, now it is easy to state that the probability of measuring \(0\) is:

\[\begin{equation} p(0)=\frac{2+2\text{Re}[\langle\psi_1|\psi_2\rangle]}{4} \end{equation}\]

We conclude the proof by recalling the Chernoff bound in theorem C.9, as we did for the proof of the swap test.

Can you think of the reasons that might lead one to prefer the swap test
over the Hadamard test, or vice versa? At the end of the day, aren’t they
both computing the same thing? For instance, note that for the Hadamard
test, we are requiring the ability to call the *controlled* version of
the unitaries \(U_1\), and \(U_2\), while for the swap test, we can just
treat them as black-boxes: these can be quantum states that we obtain from
a quantum process, or that we obtain from a quantum communication
channel.

### 2.4.5 Swap test

The swap test was originally proposed in (Buhrman et al. 2001), in the context of quantum fingerprinting, but it has been quickly extended to many other context. For us, the swap test is a way to obtain an estimate of an inner product between two quantum states. The difference between the swap test and the Hadamard test is that in this case we don’t assume to have access to the unitary creating the states, hence we cannot perform controlled operations with this unitary. You can think that we receive the states from a third-party, i.e. via a communication protocol.

**Theorem 2.4 (Swap test (no amplitude amplification)) **Assume to have access to a unitary \(U_1\) that produces a state \(U_1 |0\rangle = |\psi_1\rangle\) and a unitary \(U_2\) that produces a state \(|\psi_2\rangle\), where \(|\psi_1\rangle,|\psi_2\rangle \in \mathbb{C}^N\) for \(N=2^n, n\in\mathbb{N}\). There is a quantum algorithm that allows to estimate the quantity \(|\langle\psi_1|\psi_2\rangle|^2\) with additive precision \(\epsilon\) using \(U_1\) and \(U_2\) \(O(\frac{\log(1/\delta)}{\epsilon^2})\) times with probability \(1-\delta\)

*Proof*. Create a state \(|0\rangle|0\rangle|0\rangle\) where the first register is just an ancilla qubit, and the second and third register have \(n\) qubits each. Then, apply an Hadamard gate to the first qubit, so to obtain \(|+\rangle|0\rangle|0\rangle\). Then, apply \(U_1\) and \(U_2\) to the second and third register, and then apply a controlled swap gate controlled on the ancilla qubit, targeting the two registers. More precisely, we apply \(n\) controlled swap gates, each controlling a single qubit of the second and third register. Thus, we obtain the state:

\[\begin{equation} \frac{1}{\sqrt{2}}\left[|0\rangle(|\psi_1\rangle|\psi_2\rangle) + |1\rangle(|\psi_2\rangle|\psi_1\rangle) \right] \end{equation}\]

we now apply another Hadamard gate on the ancilla qubit, in order to obtain the following state:

\[\begin{align} |\phi\rangle=& \frac{1}{\sqrt{2}}\left[\frac{1}{\sqrt{2}}\left(|0\rangle(|\psi_1\rangle|\psi_2\rangle) + |1\rangle(|\psi_1\rangle|\psi_2\rangle)\right) + \frac{1}{\sqrt{2}}\left(|0\rangle(|\psi_2\rangle|\psi_1\rangle) - |1\rangle(|\psi_2\rangle|\psi_1\rangle) \right) \right] \\ =& \frac{1}{2}\left[|0\rangle\left(|\psi_1\rangle|\psi_2\rangle) + |\psi_2\rangle|\psi_1\rangle \right) + |1\rangle\left(|\psi_1\rangle|\psi_2\rangle) - |\psi_2\rangle|\psi_1\rangle\right)\right] \end{align}\]

Now we consider the probability of measuring \(0\) and \(1\) in the ancilla qubit. More in detail, we want to estimate \(p(0)=\langle\phi|M_0|\phi\rangle\). For this, we recall our Postulate 2.3, and more precisely equation (2.1), with \(M_0=|0\rangle\langle 0|\otimes I\), where \(I\) is the identiy operator over \(n\) qubits. It is simple to see that \(p(0)=\frac{2-2|\langle\psi_1|\psi_2\rangle|^2}{4}\).

By repeating this measurement \(O(\log(1/\delta)/\epsilon^2)\) times, duly following the statement of the Chernoff bound in theorem C.9, we have that the number of samples needed to obtain an error \(\epsilon\) with probability \(1-\delta\) is \(t=\frac{\log(1/\delta)}{2\epsilon^2}\). Once we have obtained an estimate of \(p(0)\), we can estimate the sought-after quantity of interest as \(|\langle\psi_1|\psi_2\rangle|^2 = 1-2p(0)\).

**Exercise 2.6 (Obtain the absolute value of the inner product) **In the previous theorem we obtain an estimate of \(|\langle\psi_1|\psi_2\rangle|^2\) with a certain error \(\epsilon\). If we just take the square root of that number, what is the error in the estimation of \(|\langle\psi_1|\psi_2\rangle|\)? You are encouraged to read the section in the appendix D on error propagation.

–>

–>

### G References

*arXiv Preprint arXiv:2207.11068*.

*Physical Review Letters*87 (16): 167902.

*arXiv:1907.09415*.

*Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences*439 (1907): 553–58.

*arXiv Preprint arXiv:1909.07344*.

*arXiv Preprint arXiv:1112.3333*.