# Chapter 2 Quantum computing and quantum algorithms

In this chapter we will introduce the notation used throughout the rest of the lecture notes. We will extensively use linear algebra (norm of matrices, SVD, properties of particular matrices, and so on), so the reader is higly encouraged to skim the appendix on his/her own, so to know the notation adopted here.

## 2.1 Getting rid of physics in quantum computing

Following one of the lectures of Susskind, we are going to start from a “handwavy” introduction of quantum mechanics, that starts from few considerations and lead straight to the Schrödinger equation. With a few mathematical tricks, we are going to give an intuitive justification of all the 4 axioms of quantum mechanics that are stated in the next sections. The hope is that thanks to this introduction, the reader can be gently guided from a tiny physical intuition to a greater understanding of the **axioms of quantum mechanics**. Despite the name “axioms of quantum mechanics” might seems (obviously) related to physics, thanks to this formulation (which comes from (Nielsen and Chuang 2002)), we could eventually think of them as “axioms of quantum computing.” As Scott Aaronson rightly said, if you are not a physicist, you need to remove physics from quantum mechanics to understanding it!

The objective is that the reader should *not* feel the need to dig into quantum mechanical details of quantum computers, but can comfortably sit on top of the 4 axioms of quantum mechanics, and build (a lot) from there.

When, at the beginning of the 20th century, physicists started to model quantum phenomena, they observed that the dynamic of the systems had two properties: they observed that the time and space evolution of quantum systems is continuous (as in classical mechanics) and reversible (unlike the classical world). They decided to formalize this concept as follows. First, they decided to model the state of a quantum system at time \(p\) as a function \(\psi(p)\), and they decided to model the evolution of \(\psi(p)\) for time \(t\) as an operator \(U(t)\) acting on \(\psi(p)\)s. Formally, the two requirements can be written as:

- \(U(\epsilon)=I-i\epsilon H\) (continuity)
- \(U^\dagger(t)U(t)=I\) (reversibility)

The first requirement reads that if we were to apply an evolution for a small amount of time \(\epsilon\), then \(U\) would behave almost as the identiy, and then it will apply for a “small” amount another operator \(H\). The second requirement reads that if we “undo” the operator \(U\), by applying the transpose conjugate, we would obtain the identity, i.e. we haven’t change the state of the system. From these two requirements, we can already derive the following observation.

\[I = U^\dagger(\epsilon)U(\epsilon)= (I+i\epsilon H)(I-i\epsilon H) = I -i \epsilon H + i \epsilon H^\dagger + O(\epsilon^2).\] The only way for this equation to hold is for \(H=H^\dagger\).

I.e. the operator \(H\) should be equal to its transpose conjugate. In mathematics, we have a name for such operators, and they are called Hermitian operators! (More about those in the appendix!). Now we can ask ourselves what happens when we apply \(U(\epsilon)\) to a quantum state \(\psi(t)\)? Well it’s simple to see now: \[U(\epsilon)\psi(t) = \psi(t+\epsilon) = \psi(t) -i \epsilon H\psi(t).\] With a little algebra we can rewrite the previous equation as:

\[ \frac{\psi(t+\epsilon) - \psi(t)}{\epsilon} = -i H\psi(t).\] Note that the left-hand side part of this equation can be rewritten as a derivative: \[\frac{d}{dt}\psi(t) = -i H \psi(t).\] But this is the well-known Schrödinger equation! Note that, as computer scientists, we take the right to remove some physical constant (\(\hbar\)) out of the equation. What should be the takeaway of these observations? Well, first we know that the Schrödinger equation is a differential equation whose solution is fully determined if we were to know the initial state of our system \(\psi(p)\). Formally the solution can be written as:

\[\psi(p+t)=e^{-iHt}\psi(p).\]
From this last equation we can observe further (more on this in the appendix) that the exponential of an Hermitian matrix \(e^{-iHt}\) is *defined* through its Taylor expansion is just a *unitary* matrix: \(U(t)=e^{-itH}\). Unitary matrices are exactly those matrices that describe isometries: applying a unitary matrix to a vector won’t change its length. From this, we see that the two quantum states \(\psi(p+t)\) and \(\psi(p)\) could be taken just to be vectors of a fixed length, which - for practicality - we take to be unit vectors. Notation-wise, we denote unit vectors describing quantum states as “kets,” i.e. we rewrite this equation as:

\[|\psi(p+t)\rangle=U(t)|\psi(p)\rangle\] Hopefully, this introduction should be enough for getting a better intuition of what comes next, and give you a “justification” for the axioms of quantum mechanics.

## 2.2 Axioms of quantum mechanics

The standard formalism used in Quantum Information is the Dirac’s “bra-ket” notation, which we will introduce in this section. We also recall here the postulates of quantum mechanics, and take this opportunity to settle the rest of the notation and preliminaries used in this thesis. For the postulates, we follow the standard formulation in (Nielsen and Chuang 2002).

**Proposition 2.1 (Postulate 1)**Associated to any isolated physical system is a complex vector space with inner product (that is, a Hilbert space) known as the state space of the system. The system is completely described by its state vector, which is a unit vector in the system’s state space.

As quantum states are described by unit vectors, we write \(|\psi\rangle\) for a unit vector \(\psi \in \mathcal{H}^n\). So for a non-normalized vector \(x \in \mathbb{R}^n\), the normalized quantum state is represented as \(|x\rangle = \left\lVert x\right\rVert^{-1}x = \frac{1}{\left\lVert x\right\rVert}\sum_{i=0}^n x_i|i\rangle\). We denote as \(\{|i\rangle\}_{i\in[d]}\) the canonical (also called computational) basis for the \(d\) dimensional Hilbert space \(\mathcal{H}\). The transpose-conjugate of \(|x\rangle\) is defined as \(\langle x|\). We can think of \(|x\rangle\) as a column vector, while \(\langle x|\) is a row vector, whose entries have been conjugated. In Dirac’s notation, we denote the inner product between two vector as \(\langle x|y\rangle\). Their outer product is denoted as \(|x\rangle\langle y| = \sum_{i,j \in [d]}x_i y_j |i\rangle\langle j| \in \mathcal{H}^d\otimes \mathcal{H}^d\). The smallest quantum system is called a qubit, and is a 2 dimensional unit vector in \(\mathbb{C}^2\). A base for this vector space in quantum notation is denoted as \(|0\rangle\) and \(|1\rangle\). In this case, the vector \(|\varphi\rangle = \alpha|0\rangle+\beta|1\rangle\) for \(\alpha,\beta\in \mathbb{C}\) represent a valid quantum state as long as \(|\alpha|^2 + |\beta|^2 = 1\).

**Proposition 2.2 (Postulate 2)**The evolution of a closed quantum system is described by a unitary transformation. That is, the state \(|\psi\rangle\) of the system at time \(t_1\) is related to the state \(|\psi\rangle\) of the system at time \(t_2\) by a unitary operator \(U\) which depends only on the times \(t_1\) and \(t_2\).

A matrix \(U\in \mathbb{C}^{d \times d}\) is said to be unitary if \(UU^\dagger = U^\dagger U = I\), that is, if the inverse of \(U\) equal to its conjugate transpose. From this fact it follows that unitary matrices are norm-preserving, and thus can be used as suitable mathematical description of a pure quantum evolution. It is a standard exercise to see that the following are all equivalent definition of unitary matrices (De Wolf 2019):

- \(\langle Av, Aw\rangle = \langle v,w\rangle\) for all \(v,w\).
- \(\left\lVert Av\right\rVert = \left\lVert v\right\rVert\) for all \(v\)
- \(\left\lVert Av\right\rVert = 1\) if \(\left\lVert v\right\rVert=1\).
- \(U\) is a normal matrix with eigenvalues lying on the unit circle
- \(|\det(U)|=1\)
- The columns and the rows of \(U\) form an orthonormal basis of \(\mathcal{C}^d\)
- \(U\) can be written as \(e^{iH}\) for some Hermitian operator \(H\).

**Example 2.1 (Determinant = 1 is a necessary but not sufficient condition for being unitary.)**It is simple to see that any 2x2 diagonal matrix \(A\) with entries \(10\) and \(1/10\) has determinant is 1, but it’s not a unitary matrix: \(A^\dagger A = AA^\dagger \neq I\).

It will be useful to recall that if we have a unitary that performs the mapping \(|a_i\rangle \mapsto |b_i\rangle\), we can have the “matrix” form of the operator as \(\sum_i |b_i\rangle\langle a_i|\).
Recall also that the Pauli matrices are both unitary *and* Hermitian, and this fact will be useful in many places throughout this text.

**Exercise 2.1 (From (Huang, Bharti, and Rebentrost 2019))**Let \(k \in \{0,1\}^n\) be an arbitrary \(n\)-bitstring. Let \(A=(\sigma_{x}^{(1)})^{k_1} \otimes \dots \otimes (\sigma_{x}^{(n)})^{k_n}\) and \(|b\rangle=|0^n\rangle\). What is the solution to the equation \(A|x\rangle =|b\rangle\)

**Proposition 2.3 (Postulate 3)**Quantum measurements are described by a collection \(\{M_m\}\) of measurement operators. These are operators acting on the state space of the system being measured. The index \(m\) refers to the measurement outcomes that may occur in the experiment. If the state of the quantum system is \(|\psi\rangle\) immediately before the measurement, then the probability that the result \(m\) occurs is given by \[ p(m) = \langle\psi|M^\dagger_m M_m |\psi\rangle \] and the state of the system after the measurement is \[ \frac{M_m|\psi\rangle}{\sqrt{\langle\psi|M_m^\dagger M_m|\psi\rangle}} \] The measurement operators satisfy the \[ \sum_m M^\dagger _m M_m = I \]

In practice, we will mostly perform projective measurements (also called von Neumann measurements). A projective measurement is described by an *observable*: an Hermitian operator \(M\) on the state space of the system being observed. The observable has a spectral decomposition:
\[ M = \sum_m mP_m \]
Where \(P_m\) is a projector into the eigenspace of \(M\) associated with the eigenvalue \(m\). This means that the measurement operator will satisfy the following properties:

- \(P_m\) is positive definite
- \(P_m\) is Hermitian
- \(\sum_m P_m = I\)
- \((P_m)(P_n) = \delta_{mn}(P_m)\) are orthogonal projections.

Recall that an orthogonal projector \(P\) has the properties that \(P=P^{\dagger}\) and \(P^2 = P\). Note that the second property derives from the first: all positive definite operators on \(\mathbb{C}\) are Hermitian (this is not always the case for positive definite operators on \(\mathbb{R}\), as it is simple to find positive definite matrices that are not symmetric). Projective measurements can be understood as a special case of Postulate 3: in addition to satisfying the completeness relation \(\sum_m M_m^\dagger M_m = I\) they also are orthogonal projectors. Given a state \(|\psi\rangle\), the probability of measuring outcome \(m\) is given by:

\[\begin{equation} p(m) = \langle\psi|P_m|\psi\rangle. \tag{2.1} \end{equation}\] If we were to measure outcome \(m\), then the state of the quantum system after the measurement would be: \[\frac{P_m|\psi\rangle}{\sqrt{p(m)}} .\]

They have some useful properties. Just to cite one, the average value of a projective measurement in a state \(|\psi\rangle\) is defined as: \[\begin{align} E(M)& = \sum_m p(m)\\ & = \sum_m m \langle\psi|P_m|\psi\rangle\\ & \langle\psi|(\sum_m mP_m)|\psi\rangle\\ & \langle\psi|M|\psi\rangle \end{align}\] In practice, our projective operators will be projectors in the computational basis, i.e. \(P_m = \sum_{m \in [d]} |m\rangle\langle m|\). From these rules, it is simple to see that the probability that a measurement on a state \(|x\rangle = \frac{1}{\left\lVert x\right\rVert}\sum_i x_i|i\rangle\) gives outcome \(i\) is \(|x_i|^2/\left\lVert x\right\rVert^2\).

**Proposition 2.4 (Postulate 4)**The state space of a composite physical system is the tensor product of the state spaces of the component physical systems. Moreover, if we have systems numbered from 1 through \(n\), and each state is described as \(|\psi_i\rangle\), the join state of the total system is \(\bigotimes_{j=1}^n |\psi_i\rangle=|\psi_1\rangle|\psi_2\rangle\dots |\psi_n\rangle\).

To describe together two different quantum system we use the tensor product. The tensor product between two vectors \(|y\rangle \in \mathbb{R}^{d_1}\) and \(|y\rangle \in \mathbb{R}^{d_2}\) is a vector \(|z\rangle \in \mathbb{R}^{d_1 \times d_2}\). We can use the tensor operation to describe the joint evolution of separate quantum system.

Even if it’s not explicitly used much in quantum algorithms, it’s useful to recall the definition of entangled pure state.

**Definition 2.1 (Entangled state)**A quantum state that cannot be expressed as tensor product of two quantum state is said to be entangled.

The same thing can be done for operators. Let \(U_1\) be the unitary describing the evolution of a quantum state \(|x\rangle\) and \(U_2\) the unitary describing the evolution of a quantum state \(|y\rangle\). Then \(U_1 \otimes U_2\) describes the evolution of the quantum system \(|x\rangle\otimes |y\rangle\). Note that to build a state in \(|v\rangle \in \mathcal{H}^n\) we need \(\lceil \log n\rceil\) qubits, and this fact will be extensively leveraged in our quantum algorithms.

### 2.2.1 Review of important statements in quantum computation

Before delving into a review of quantum algorithms, we would like to state here a few important lemmas.

**Lemma 2.1 (Hadamard on a bitstring (Nielsen and Chuang 2002)) **Let \(x \in \{0,1\}^n\) be a bitstring, and \(H\) be an Hadamard gate. Then:

## 2.3 Measuring complexity of quantum algorithms

This section is an attempt to organize in a coherent way some fundamental concepts in quantum computer science. The formalization of some of these concepts comes from (Dörn 2008) and (De Wolf 2019).

There are various ways to measure the complexity of a quantum algorithm. We denote with \(T(U)\) the time complexity needed to implement \(U\), measured in terms of **number of gates** of the circuit. This is a concept that bears some similarity with the clock rate of classical CPUs.

We use a standard notation of \(\widetilde{O}\) for hiding polylogarithmic factors in the big-O notation of the algorithms: \(O(\log(n))=\widetilde{O}(1)\).

**Definition 2.2 (Quantum query or oracle access to a function)**Let \(\mathcal{H}\) be a finite-dimensional Hilbert space with basis \(\{0,1\}^{n}\). Given \(f:\{0,1\}^n\to\{0,1\}^m\), we say that we have quantum query access to \(f\) if we have access to a unitary operator \(U_f\) on \(\mathcal{H}\otimes\mathbb{C}^{2^n}\) such that \(U|x\rangle|b\rangle = |x\rangle|b \oplus f(x)\rangle\) for any bit string \(b\in\{0,1\}^m\). One application of \(U_f\) costs \(T_f\) operations.

**Definition 2.3 (Quantum computation in the query model) **Let \(O_x\) be a unitary operator that encodes the input of our computation, and acts in a non-trivial way on its associated Hilbert space.
A quantum computation with \(T\) queries to an oracle \(O_x : |i,b,z\rangle \mapsto |i,b \oplus x_i, z\rangle\) is a sequence of unitary transformations:

Note that the second register holds the XOR of the \(i\)-th component of the input with the previous state of the register (i.e. the b). This is to make the computation reversible. Importantly, the definition 2.2 is just an example of function for which we can have query access. We can assume query access to unitaries creating various kind of quantum states as output. We will see many examples of oracles as definition 3.3, 3.6, 3.7, and 3.8.

This is the so-called query model, or oracle model of computation. The important thing here is the last statement of 2.2 about the cost of applying \(U_f\) is \(O(1)\). There are multiple reasons for working in this model. First, it is often the case that queries to these oracles are actually efficient (as we will see in many example), so the query complexity is actually equivalent (up to multiplicative polylogarithmic factors) to the depth of the quantum circuit that is going to be executed. Another reason is that in the oracle model is relatively simple to prove lower bounds and results about the complexity of an algorithm in terms of the number of queries to an oracle that encodes the input of the problem. It is customary, for complex results in quantum algorithms to separate the study of the query complexity of the problem and the depth of the quantum circuit which is executed on the real hardware. We formalize more this difference in the following definitions.

**Definition 2.4 (Query complexity)**The quantum query complexity of a quantum algorithm \(\mathcal{A}\) is the number of queries to a black-box made by \(\mathcal{A}\) in order to compute \(f\).

If we just care about the **relativized** complexity, we might limit ourselves to compare two algorithms that solve the same problem in terms of the number of queries to a given oracle, we might observe that one is faster than the other. This is a **relativized** speedup. The oppositive is an **absolute** speedup, i.e. when we also take into account the complexity of the operations that are **not** queries to an oracle. In the case of quantum algorithms, these might simply be the gate depth of the circuit.

**Definition 2.5 (Circuit complexity or time complexity)**The quantum circuit complexity (or time complexity) of a quantum algorithm \(\mathcal{A}\) is the depth of the quantum circuit implementing \(\mathcal{A}\).

Quantum computing is not the only place where we measure the complexity in terms of query to an oracle. In fact, it’s sufficient to do a few “queries” (pun intended) on your search engine to realize that in many computational models we have adopted this measure of computational complexity.

**Note that the query complexity of an algorithm is a lower bound on the gate complexity of the quantum circuit.** It is often simpler to study first the query complexity of a quantum algorithm and then study the time complexity. For most quantum algorithms (but not all!) the time complexity coincides with the query complexity, up to a logarithmic factor. Note that, if we find a way to have an oracle whose depth (i.e. circuit complexity) is only (poly)logarithmic in the input size, then the query complexity and the gate complexity coincide up to a negligible polylogarithmic factor.
There are some exceptions. Most notably, there is a quantum algorithm for the important *hidden subgroup problem* with only polynomial query complexity, while the classical counterpart has a query complexity that is exponential in the input size. Nevertheless, the overall time complexity of the quantum algorithm is (to date) still exponential, and polynomial-time quantum algorithms are known only for a few specializations of the problem.

We will clarify better some definitions that are used to describe the probabilistic behavior of an algorithm:

**Definition 2.6 (Kind of randomized algorithms) **Let \(f : \{0,1\}^N \mapsto \{0,1\}\) be a Boolean function. An algorithm computes \(f\):

**exactly**if the outputs equals \(f(x)\) with probability 1 for all \(x \in\{0,1\}^N\)- with
**zero error**if it is allowed to give the answer “UNDEFINED” with probability smaller than \(1/2\) (but if the output is \(0\) or \(1\) it must be correct) - with
**bounded error**if the output equals \(f(x)\) with probability greater than \(2/3\) for all \(x\in \{0,1\}^N\).

A bounded error (quantum or classical) algorithm that fails with probability \(1/3\) (or any other constant smaller than \(1/2\)) is meant to fail *in the worst-case*. We do not expect the algorithm to fail in the average case, i.e. for most of the inputs (see Appendix of (De Wolf 2019)).

If a (quantum or classical) algorithm is said to output the right answer in **expected** (oftain said “in expectation”) running time \(T\), we can quickly create another algorithm that has **worst-case** guarantees on the runtime. This is obtained using the Markov’s inequality, i.e. theorem C.2 as follows. Run the algorithm for \(kT\) steps, i.e.. stop the execution after \(kT\) steps if it hasn’t terminated already. If \(X\) is the random variable of the runtime of the computation (so \(\mathbb{E}[X]=T\)), then:

\[Pr\left[X > kT \right] \leq \frac{1}{k} \] So with probability \(\geq 1-\frac{1}{k}\) we will have the output of the algorithm.

## 2.4 Review of famous quantum algorithms

In this chapter we will explore some introductory quantum algorithms. Some of them are not particularly related to data analysis or machine learning, but given their potential to help us better understand the model of quantum computation that we adopt, we decided it was important to report them here. Others will prove to be really useful subroutines for the quantum machine learning practitioner.

### 2.4.1 Deutsch-Josza

**Definition 2.7 (Constant function)**A function \(f :\{0,1\}^n \mapsto \{0,1\}\) is constant if \(f(x)=0 \forall x \in \{0,1\}^n\) or \(f(x)=1 \forall x \in \{0,1\}^n\).

**Definition 2.8 (Balanced function)**A function \(f :\{0,1\}^n \mapsto \{0,1\}\) is balanced if \(f(x)=0\) for half of the inputs and \(f(x)=1\) for the other half.

**Theorem 2.1 (Deutsch-Josza (Deutsch and Jozsa 1992))**Assume to have quantum access (as definition 2.2 ) to a unitary \(U_f\) that computes the function \(f :\{0,1\}^n \mapsto \{0,1\}\), which we are promised to be either constant or balanced. There is a quantum algorithm that decides which is the case with probabiliy \(1\), using \(U_f\) only once and using \(O(\log(n))\) other gates.

*Proof. * We start our quantum computer initializing \(n\) qubit as \(|0\rangle\) state follwed by a single ancilla qubit initialized in state \(|1\rangle\), which we will use for the phase-kickback. Then, we apply the Hadamard transform on each of them. Mathematically, we are performing the following mapping:

\[\begin{equation} |0\rangle^{\otimes n}|1\rangle \mapsto \left(\frac{1}{\sqrt{2^n}}\sum_{x\in\{0,1\}^n} |x\rangle \right)|-\rangle \end{equation}\] Now we apply \(U_f\) using the first register as input and the ancilla register as output. Our quantum computer is now in the state \[\left(\frac{1}{\sqrt{2^n}}\sum_{x\in\{0,1\}^n}(-1)^{f(x)}|x\rangle \right)|-\rangle\] Now we apply \(n\) Hadamard gates to the \(n\) qubits in the first registers. Recalling lemma 2.1, this gives the state

\[\left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}(-1)^{f(x)} \sum_{j \in \{0,1\}^n }(-1)^{xj} |j\rangle \right) |+\rangle = \left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}\sum_{j \in \{0,1\}^n} (-1)^{f(x)+ xj} |j\rangle \right)|+\rangle\]

In this state, note that the normalization factor has changed from \(\frac{1}{\sqrt{2^n}}\) to \(\frac{1}{2^n}\), and recall that \((-1)^{xj}\) is read as \((-1)^{ \sum_{p} x_pj_p \text{mod}2 }\). The key idea of the proof of this algorithm lies in asking the right question to the previous state: what is the probability of measuring the state \(|0\rangle^n\) in the first register? The answer to this question will conclude the proof of this theorem. Before looking at the probability, observe that the amplitude of the state \(|j=0\rangle\) we will see that it is just \(\frac{1}{2^n}\sum_{x}(-1)^{f(x)}\), as \(x^Tj=0\) if \(j=0_1\dots 0_n\), for all \(x\). Then,

\[\begin{equation} \frac{1}{2^n} \sum_{i \in \{0,1\}^n } (-1)^f(x) = \begin{cases} 1 & \text{if } f(x)=0 \forall x \\ -1 & \text{if } f(x)=1 \forall x \\ 0 & \text{if } f(x) \text{is balanced} \end{cases} \end{equation}\]

To conclude, reckon that if the function \(f\) is constant (first two cases), we will measure \(|0\rangle^{\otimes n}\) with probability \(1\), and if the function is balanced, we will measure some bitstring of \(n\) bits that is different than the string \(0_1\dots 0_n\).

It’s simple to see that if we want to solve this problem with a classical *deterministic* algorithm, we need exactly \(2^n/2 + 1\) queries. However, with the usage of a randomized algorithm we can drastically reduce the number of queries by admitting a small probability of failure.

**Exercise 2.2**Can you think of an efficient randomized classical algorithm for solving this problem? Perhaps you can use the tools in the Appendix for randomized algorithms.

We now turn our attention to the first learning problem of this book. This is rarely stressed that the following algorithm can be interpreted as a learning algorithm.

### 2.4.2 Bernstein-Vazirani

**Theorem 2.2 (Bernstein-Vazirani)**Assume to have quantum access (as definition 2.2 ) to a unitary \(U_f\) that computes the function \(f :\{0,1\}^n \mapsto \{0,1\}\), which computes \(f_a(x) = (x,a) = ( \sum_i^n x_i a_i )\mod 2\) for a secret string \(a \in \{0,1\}^n\). There is a quantum algorithm that learns \(a\) with probability \(1\), using \(U_f\) only once and \(O(\log(n))\) other gates.

*Proof. * The algorithm follows exactly the same steps as the Deutsch-Josza algorithm. The proof is slightly different, and start by noting that, after the application of the oracle \(U_f\), the register of our quantum computer is in the following state:

\[\left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}(-1)^{f(x)} |x\rangle \right) |+\rangle = \left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n} (-1)^{a^T x}|x\rangle \right)|+\rangle\]

Now we resort again to Lemma (lem:hadamard-on-bitstring), and we use the fact that the Hadamard it is also a self-adjoint operator (i.e. it is the inverse of itself: \(H^2 = I\)). Thus applying \(n\) Hadamard gates to the first register leads to the state \(|a\rangle\) deterministically.

**Exercise 2.3**Can you think of an efficient randomized classical algorithm for solving Berstain-Vazirani problem? You can use the tools in the Appendix for randomized algorithms.

Other material for learning about Deutsch-Josza and Bernstein-Vazirani algorithms are the lecture notes of Ronald de Wolf that you can find here.

### 2.4.3 Hadamard test

Let \(U\) be a unitary acting on \(n\) qubits, and \(|\psi\rangle\) a quantum state on \(n\) qubit (generated by another unitary \(V\)). We also require to be able to apply the controlled version of the unitary \(U\). Then, the Hadamard test is a quantum circuit that we can use to estimate the value of \(\langle\psi| U \psi\rangle\). The circuit is very simple, it consists in a Hadamard gate applied on an ancilla qubit, the controlled application of the unitary \(U\) and another Hadamard gate.

The initial operation leads to \((H\otimes V) |0\rangle|0\rangle = |+\rangle|\psi\rangle\), then we have: \[\begin{align} |\psi_{\rm{final}}\rangle & =(H\otimes I)(cU )|+\rangle|\psi\rangle = (H\otimes I)\frac{1}{\sqrt{2}}\left(|0\rangle|\psi\rangle+|1\rangle U|\psi\rangle \right) \\ & =\frac{1}{2}\left(|0\rangle\left(|\psi\rangle + U|\psi\rangle \right) + |1\rangle\left(|\psi\rangle - U|\psi\rangle \right) \right) \end{align}\]

Note that the last state could be written equivalenlty, by just factoring out the \(|\psi\rangle\) state as \(|\psi_{\rm{final}}\rangle=\frac{1}{2}\left(|0\rangle(I+U)|\psi\rangle + |1\rangle(I-U)|\psi\rangle \right)\). The probability of measuring \(0\) in the first qubit is: \[\begin{align} p(0)= & \left\lVert\frac{1}{2}(I+U)|\psi\rangle \right\rVert_2^2 = \frac{1}{4} \left(\langle\psi| + \langle\psi|U^\dagger \right)\left(|\psi\rangle + U|\psi\rangle \right) \\ =& \frac{2+\langle\psi(U+U^\dagger)\psi\rangle}{4} = \frac{2+2\rm{Re}(\langle\psi|U|\psi\rangle)}{4} \end{align}\]

Where we used Postulate 2.3 with the observable \(|0\rangle\langle 0|\otimes I\).The probability of measuring \(1\) in the first register follows trivially.

**Exercise 2.4**Can you tell what is the expected value of the observable \(Z\) of the ancilla qubit? Remember that the possible outcome of the observable \(Z\) are \(\{+1, -1\}\).

However, we might be interested in the imaginary part of \(\langle\psi|U\psi\rangle\). To estimate that, we need to slightly change the circuit. After the first Hadamard gate, we apply on the ancilla qubit a Phase gate, which gives to the state \(|1\rangle\) a phase of \(-i\). To get the intuition behind this, let’s recall that the imaginary part of a complex number \(z=(a+ib)\) is defined as: \(\rm{Im}(z)= \frac{z-z^\ast}{2i}=\frac{i(z-z^\ast)}{-2}= \frac{-2b}{-2} =b\), where after the definition, we just complicated the series of equations by multiplying the numerator and denominator by \(i\), a trick that we will use later. The rest of the circuit of the Hadamard test stays the same. The evolution of our state in the quantum computer is the following:

\[\begin{align} |\psi_{\rm{final}}\rangle& =(H\otimes I)(cU )\left( |0\rangle - i|1\rangle\right)|\psi\rangle = (H\otimes I)\frac{1}{\sqrt{2}}\left(|0\rangle|\psi\rangle-i|1\rangle U|\psi\rangle \right) \\ & = \frac{1}{2}\left(|0\rangle\left(|\psi\rangle -iU|\psi\rangle \right) + |1\rangle\left(|\psi\rangle + i U|\psi\rangle \right) \right) \end{align}\]

The probability of measuring \(0\) is given by the following equation.

\[\begin{align} p(0)=\frac{1}{4}\left(\langle\psi|+iU\langle\psi| \right)\left(|\psi\rangle-iU|\psi\rangle \right) = \frac{1}{4} \left(2 - i\langle\psi|U|\psi\rangle + i \langle\psi|U^\dagger|\psi\rangle \right) \end{align}\]

Note that when taking the conjugate of our state, we changed the sign of \(i\). We now have only to convince ourselves that \(-i\langle\psi|U|\psi\rangle + i \langle\psi|U^\dagger|\psi\rangle = i\langle\psi(U^\dagger -U)\psi\rangle\) is indeed the real number corresponding to \(2\rm{Im}(\langle\psi| U|\psi\rangle)\), and thus the whole equation can be a probability.

**Exercise 2.5**Can you check if the \(S\) gate that we do after the first Hadamard can be performed before the last Hadamard gate instead?

### 2.4.4 Modified Hadamard test

In this section we complicate a little the results obtained in the previous one, by finding the number of samples that we need to draw out of a circuit in order to estimate the expected value or the probability of interested with a certain level of accuracy and with a certain probability.

**Theorem 2.3 (Modified Hadamard test (no amplitude amplification))**Assume to have access to a unitary \(U_1\) that produces a state \(U_1 |0\rangle = |\psi_1\rangle\) and a unitary \(U_2\) that produces a state \(|\psi_2\rangle\), where \(|\psi_1\rangle,|\psi_2\rangle \in \mathbb{R}^N\) for \(N=2^n, n\in\mathbb{N}\). There is a quantum algorithm that allows to estimate the quantity \(\langle\psi_1|\psi_2\rangle\) with additive precision \(\epsilon\) using controlled applications of \(U_1\) and \(U_2\) \(O(\frac{\log(1/\delta)}{\epsilon_2})\) times, with probability \(1-\delta\)

*Proof. * Create a state \(|0\rangle|0\rangle|0\rangle\) where the first register is just an ancilla qubit, and the second and third register have \(n\) qubits. Then, apply an Hadamard gate to the first qubit, so to obtain \(|+\rangle|0\rangle|0\rangle\). Then, controlled on the first register being \(0\), we apply the unitary \(U_1\), and controlled on the register being \(1\), we apply the unitary \(U_2\). Then, we apply again the Hadamard gate on the ancilla qubit. The state that we obtain is the following:

\[\begin{align} (H\otimes I ) \frac{1}{\sqrt{2}}\left( |0\rangle|\psi_1\rangle + |1\rangle|\psi_2\rangle \right) \\ = \frac{1}{2}\left(|0\rangle(|\psi_1\rangle + |\psi_2\rangle) + |1\rangle(|\psi_1\rangle - |\psi_2\rangle) \right) \end{align}\]

Again, now it is easy to state that the probability of measuring \(0\) is:

\[\begin{equation} p(0)=\frac{2+2\rm{Re}[\langle\psi_1|\psi_2\rangle]}{4} \end{equation}\]

We conclude the proof by recalling the Chernoff bound in theorem C.8, as we did for the proof of the swap test.

Can you think of the reasons that might lead one to prefer the swap test over the Hadamard test, or vice versa? At the end of the day, they are both computing the same thing? For instance, note that for the Hadamard test, we are requiring the ability to call the *controlled* version of the unitaries \(U_1\), and \(U_2\), while for the swap test, we can just trat them as black-box: these can be quantum states that we obtain from a quantum process, or that we obtain from a quantum communication channel.

### 2.4.5 Swap test

The swap test was originally proposed in (Buhrman et al. 2001), in the context of quantum fingerprinting, but it has been quickly extended to many other context. For us, the swap test is a way to obtain an estimate of an inner product between two quantum states. The difference between the swap test and the Hadamard test is that in this case we don’t assume to have access to the unitary creating the states, hance we cannot perform controlled operations on this unitary. You can think that we receive the states from a third-party, i.e. via a communication protocol.

**Theorem 2.4 (Swap test (no amplitude amplification))**Assume to have access to a unitary \(U_1\) that produces a state \(U_1 |0\rangle = |\psi_1\rangle\) and a unitary \(U_2\) that produces a state \(|\psi_2\rangle\), where \(|\psi_1\rangle,|\psi_2\rangle \in \mathbb{R}^N\) for \(N=2^n, n\in\mathbb{N}\). There is a quantum algorithm that allows to estimate the quantity \(|\langle\psi_1|\psi_2\rangle|^2\) with additive precision \(\epsilon\) using \(U_1\) and \(U_2\) \(O(\frac{\log(1/\delta)}{\epsilon_2})\) times with probability \(1-\delta\)

*Proof. * Create a state \(|0\rangle|0\rangle|0\rangle\) where the first register is just an ancilla qubit, and the second and third register have \(n\) qubits. Then, apply an Hadamard gate to the first qubit, so to obtain \(|+\rangle|0\rangle|0\rangle\). Then, apply \(U_1\) and \(U_2\) to the second and third register, and then apply a controlled swap gate controlled on the ancilla qubit, targeting the two registers. More precisely, we apply \(n\) controlled swap gates, each controlling a single qubit of the second and third register. Thus, we obtain the state:

\[\begin{equation} \frac{1}{\sqrt{2}}\left[|0\rangle(|\psi_1\rangle|\psi_2\rangle) + |1\rangle(|\psi_2\rangle|\psi_1\rangle) \right] \end{equation}\]

we now apply another Hadamard gate on the ancilla qubit, so to obtain the following state:

\[\begin{align} |\phi\rangle=& \frac{1}{\sqrt{2}}\left[\frac{1}{\sqrt{2}}\left(|0\rangle(|\psi_1\rangle|\psi_2\rangle) + |1\rangle(|\psi_1\rangle|\psi_2\rangle)\right) + \frac{1}{\sqrt{2}}\left(|1\rangle(|\psi_2\rangle|\psi_1\rangle) - |0\rangle(|\psi_2\rangle|\psi_1\rangle) \right) \right] \\ =& \frac{1}{2}\left[|0\rangle\left(|\psi_1\rangle|\psi_2\rangle) + |\psi_2\rangle|\psi_1\rangle \right) + |1\rangle\left(|\psi_1\rangle|\psi_2\rangle) - |\psi_2\rangle|\psi_1\rangle\right)\right] \end{align}\]

Now we consider the probability of measuring \(0\) and \(1\) in the ancilla qubit. More in detail, we want to estimate \(p(0)=\langle\phi|M_0|\phi\rangle\). For this, we recall our Postulate 2.3, and more precisely equation (2.1), with \(M_0=|0\rangle\langle 0|\otimes I\), where \(I\) is the identiy operator over \(n\) qubits. It is simple to see that \(p(0)=\frac{2-2|\langle\psi_1|\psi_2\rangle|^2}{4}\).

By repeating this measurement \(O(\log(1/\delta)/\epsilon^2)\) times, duly following the statement of the Chernoff bound in theorem C.8, we have that the number of samples needed to obtain an error \(\epsilon\) with probability \(1-\delta\) is \(t=\frac{\log(1/\delta)}{2\epsilon^2}\). Once we have obtained an estimate of \(p(0)\), we can estiamte the sought-after quantity of interest as \(|\langle\psi_1|\psi_2\rangle|^2 = 1-2p(0)\).**Exercise 2.6 (Obtain the absolute value of the inner product)**In the previous theorem we obtain an estimate of \(|\langle\psi_1|\psi_2\rangle|^2\) with a certain error \(\epsilon\). If we just take the square root of that number, what is the error in the estimation of \(|\langle\psi_1|\psi_2\rangle|\)? You are encouraged to read the section in the appendix E on error propagation.