Chapter 2 Quantum computing and quantum algorithms

In this chapter we will introduce the notation used throughout the rest of the lecture notes. We will extensively use linear algebra (norm of matrices, SVD, properties of particular matrices, and so on), so the reader is higly encouraged to skim the appendix on his/her own, so to know the notation adopted here.

2.1 Getting rid of physics in quantum computing

Following one of the lectures of Susskind, we are going to start from a “handwavy” introduction of quantum mechanics, that starts from few considerations and lead straight to the Schrödinger equation. With a few mathematical tricks, we are going to give an intuitive justification of all the 4 axioms of quantum mechanics that are stated in the next sections. The hope is that thanks to this introduction, the reader can be gently guided from a tiny physical intuition to a greater understanding of the axioms of quantum mechanics. Despite the name “axioms of quantum mechanics” might seems (obviously) related to physics, thanks to this formulation (which comes from (Nielsen and Chuang 2002)), we could eventually think of them as “axioms of quantum computing.” As Scott Aaronson rightly said, if you are not a physicist, you need to remove physics from quantum mechanics to understanding it!

The objective is that the reader should not feel the need to dig into quantum mechanical details of quantum computers, but can comfortably sit on top of the 4 axioms of quantum mechanics, and build (a lot) from there.

When, at the beginning of the 20th century, physicists started to model quantum phenomena, they observed that the dynamic of the systems had two properties: they observed that the time and space evolution of quantum systems is continuous (as in classical mechanics) and reversible (unlike the classical world). They decided to formalize this concept as follows. First, they decided to model the state of a quantum system at time $p$ as a function $\psi(p)$, and they decided to model the evolution of $\psi(p)$ for time $t$ as an operator $U(t)$ acting on $\psi(p)$s. Formally, the two requirements can be written as:

$U(\epsilon)=I-i\epsilon H$ (continuity)
$U^\dagger(t)U(t)=I$ (reversibility)

The first requirement reads that if we were to apply an evolution for a small amount of time $\epsilon$, then $U$ would behave almost as the identiy, and then it will apply for a “small” amount another operator $H$. The second requirement reads that if we “undo” the operator $U$, by applying the transpose conjugate, we would obtain the identity, i.e. we haven’t change the state of the system. From these two requirements, we can already derive the following observation.

\[I = U^\dagger(\epsilon)U(\epsilon)= (I+i\epsilon H)(I-i\epsilon H) = I -i \epsilon H + i \epsilon H^\dagger + O(\epsilon^2).\] The only way for this equation to hold is for $H=H^\dagger$.

I.e. the operator $H$ should be equal to its transpose conjugate. In mathematics, we have a name for such operators, and they are called Hermitian operators! (More about those in the appendix!). Now we can ask ourselves what happens when we apply $U(\epsilon)$ to a quantum state $\psi(t)$? Well it’s simple to see now: \[U(\epsilon)\psi(t) = \psi(t+\epsilon) = \psi(t) -i \epsilon H\psi(t).\] With a little algebra we can rewrite the previous equation as:

\[ \frac{\psi(t+\epsilon) - \psi(t)}{\epsilon} = -i H\psi(t).\] Note that the left-hand side part of this equation can be rewritten as a derivative: \[\frac{d}{dt}\psi(t) = -i H \psi(t).\] But this is the well-known Schrödinger equation! Note that, as computer scientists, we take the right to remove some physical constant ($\hbar$) out of the equation. What should be the takeaway of these observations? Well, first we know that the Schrödinger equation is a differential equation whose solution is fully determined if we were to know the initial state of our system $\psi(p)$. Formally the solution can be written as:

\[\psi(p+t)=e^{-iHt}\psi(p).\] From this last equation we can observe further (more on this in the appendix) that the exponential of an Hermitian matrix $e^{-iHt}$ is defined through its Taylor expansion is just a unitary matrix: $U(t)=e^{-itH}$. Unitary matrices are exactly those matrices that describe isometries: applying a matrix to a vector won’t change its length. From this, we see that the two quantum states $\psi(p+t)$ and $\psi(p)$ could be just taken to be vectors of a fixed length, which - for practicality - we take to be unit vectors. Notation-wise, we denote unit vectors describing quantum states as “kets,” i.e. we rewrite this equation as:

\[|\psi(p+t)\rangle=U(t)|\psi(p)\rangle\] Hopefully, this introduction should be enough for getting a better intuition of what comes next, and give you a “justification” for the axioms of quantum mechanics.

2.2 Axioms of quantum mechanics

The standard formalism used in Quantum Information is the Dirac’s ``bra-ket’’ notation, which we will introduce in this section. We also recall here the postulates of quantum mechanics, and take this opportunity to settle the rest of the notation and preliminaries used in this thesis. For the postulates, we follow the standard formulation in (Nielsen and Chuang 2002).

Proposition 2.1 (Postulate 1) Associated to any isolated physical system is a complex vector space with inner product (that is, a Hilbert space) known as the state space of the system. The system is completely described by its state vector, which is a unit vector in the system’s state space.

As quantum states are described by unit vectors, we write $|\psi\rangle$ for a unit vector $\psi \in \mathcal{H}^n$. So for a non-normalized vector $x \in \mathbb{R}^n$, the normalized quantum state is represented as $|x\rangle = \left\lVert x\right\rVert^{-1}x = \frac{1}{\left\lVert x\right\rVert}\sum_{i=0}^n x_i|i\rangle$. We denote as $\{|i\rangle\}_{i\in[d]}$ the canonical (also called computational) basis for the $d$ dimensional Hilbert space $\mathcal{H}$. The transpose-conjugate of $|x\rangle$ is defined as $\langle x|$. We can think of $|x\rangle$ as a column vector, while $\langle x|$ is a row vector, whose entries have been conjugated. In Dirac’s notation, we denote the inner product between two vector as $\langle x|y\rangle$. Their outer product is denoted as $|x\rangle\langle y| = \sum_{i,j \in [d]}x_i y_j |i\rangle\langle j| \in \mathcal{H}^d\otimes \mathcal{H}^d$. The smallest quantum system is called a qubit, and is a 2 dimensional unit vector in $\mathbb{C}^2$. A base for this vector space in quantum notation is denoted as $|0\rangle$ and $|1\rangle$. In this case, the vector $|\varphi\rangle = \alpha|0\rangle+\beta|1\rangle$ for $\alpha,\beta\in \mathbb{C}$ represent a valid quantum state as long as $|\alpha|^2 + |\beta|^2 = 1$.

Proposition 2.2 (Postulate 2) The evolution of a closed quantum system is described by a unitary transformation. That is, the state $|\psi\rangle$ of the system at time $t_1$ is related to the state $|\psi\rangle$ of the system at time $t_2$ by a unitary operator $U$ which depends only on the times $t_1$ and $t_2$.

A matrix $U\in \mathbb{C}^{d \times d}$ is said to be unitary if $UU^\dagger = U^\dagger U = I$, that is, if the inverse of $U$ equal to its conjugate transpose. From this fact it follows that unitary matrices are norm-preserving, and thus can be used as suitable mathematical description of a pure quantum evolution. It is a standard exercise to see that the following are all equivalent definition of unitary matrices (De Wolf 2019):

$\langle Av, Aw\rangle = \langle v,w\rangle$ for all $v,w$.
$\left\lVert Av\right\rVert = \left\lVert v\right\rVert$ for all $v$
$\left\lVert Av\right\rVert = 1$ if $\left\lVert v\right\rVert=1$.
$U$ is a normal matrix with eigenvalues lying on the unit circle
$|\det(U)|=1$
The columns and the rows of $U$ form an orthonormal basis of $\mathcal{C}^d$
$U$ can be written as $e^{iH}$ for some Hermitian operator $H$.

Example 2.1 (Determinant = 1 is a necessary but not sufficient condition for being unitary.) It is simple to see that any 2x2 diagonal matrix $A$ with entries $10$ and $1/10$ has determinant is 1, but it’s not a unitary matrix: $A^\dagger A = AA^\dagger \neq I$.

It will be useful to recall that if we have a unitary that perform the mapping $|a_i\rangle \mapsto |b_i\rangle$, we can have the “matrix” form of the opertaor as $\sum_i |b_i\rangle\langle a_i|$. Recall also that the Pauli matrices are both unitary and Hermitian, and this fact will be useful in many places throughout this text.

Exercise 2.1 (From (Huang, Bharti, and Rebentrost 2019)) Let $k \in \{0,1\}^n$ be an arbitrary $n$-bitstring. Let $A=(\sigma_{x}^{(1)})^{k_1} \otimes \dots \otimes (\sigma_{x}^{(n)})^{k_n}$ and $|b\rangle=|0^n\rangle$. What is the solution to the quation $A|x\rangle =|b\rangle$

Proposition 2.3 (Postulate 3) Quantum measurements are described by a collection $\{M_m\}$ of measurement operators. These are operators acting on the state space of the system being measured. The index $m$ refers to the measurement outcomes that may occur in the experiment. If the state of the quantum system is $|\psi\rangle$ immediately before the measurement, then the probability that the result $m$ occurs is given by \[ p(m) = \langle\psi|M^\dagger_m M_m |\psi\rangle \] and the state of the system after the measurement is \[ \frac{M_m|\psi\rangle}{\sqrt{\langle\psi|M_m^\dagger M_m|\psi\rangle}} \] The measurement operators satisfy the \[ \sum_m M^\dagger _m M_m = I \]

In practice, we will mostly perform projective measurements (also called von Neumann measurements). A projective measurement is described by an observable: an Hermitian operator $M$ on the state space of the system being observed. The observable has a spectral decomposition: \[ M = \sum_m mP_m \] Where $P_m$ is a projector into the eigenspace of $M$ associated with the eigenvalue $m$. This means that the measurement operator will satisfy the following properties:

$P_m$ is positive definite
$P_m$ is Hermitian
$\sum_m P_m = I$
$(P_m)(P_n) = \delta_{mn}(P_m)$ are orthogonal projections.

Recall that an orthogonal projector $P$ has the properties that $P=P^{\dagger}$ and $P^2 = P$. Note that the second property derives from the first: all positive definite operators on $\mathbb{C}$ are Hermitian (this is not always the case for positive definite operators on $\mathbb{R}$, as it is simple to find positive definite matrices that are not symmetric). Projective measurements can be understood as a special case of Postulate 3: in addition to satisfy the completeness relation $\sum_m M_m^\dagger M_m = I$ they also are orthogonal projectors. Given a state $|\psi\rangle$, the probability of measuring outcome $m$ is given by:

\[\begin{equation} p(m) = \langle\psi|P_m|\psi\rangle. \tag{2.1} \end{equation}\] If we were to measure outcome $m$, then the state of the quantum system after the measurement would be: \[\frac{P_m|\psi\rangle}{\sqrt{p(m)}} .\]

They have some useful properties. Just to cite one, the average value of a projective measurement in a state $|\psi\rangle$ is define as: \[\begin{align} E(M)& = \sum_m p(m)\\ & = \sum_m m \langle\psi|P_m|\psi\rangle\\ & \langle\psi|(\sum_m mP_m)|\psi\rangle\\ & \langle\psi|M|\psi\rangle \end{align}\] In practice, our projective operators will be projectors in the computational basis, i.e. $P_m = \sum_{m \in [d]} |m\rangle\langle m|$. From these rules, it is simple to see that the probability that a measurement on a state $|x\rangle = \frac{1}{\left\lVert x\right\rVert}\sum_i x_i|i\rangle$ gives outcome $i$ is $|x_i|^2/\left\lVert x\right\rVert^2$.

Proposition 2.4 (Postulate 4) The state space of a composite physical system is the tensor product of the state spaces of the component physical systems. Moreover, if we have systems numbered from 1 through $n$, and each state is described as $|\psi_i\rangle$, the join state of the total system is $\bigotimes_{j=1}^n |\psi_i\rangle=|\psi_1\rangle|\psi_2\rangle\dots |\psi_n\rangle$.

To describe together two different quantum system we use the tensor product. The tensor product between two vectors $|y\rangle \in \mathbb{R}^{d_1}$ and $|y\rangle \in \mathbb{R}^{d_2}$ is a vector $|z\rangle \in \mathbb{R}^{d_1 \times d_2}$. We can use the tensor operation to describe the joint evolution of separate quantum system. TODO FIXME

Even if it’s not explicitly used much in quantum algorithms, it’s useful to recall the definition of entangled pure state.

Definition 2.1 (Entangled state) A quantum state that cannot be expressed as tensor product of two quantum state is said to be entangled.

The same thing can be done for operators. Let $U_1$ be the unitary describing the evolution of a quantum state $|x\rangle$ and $U_2$ the unitary describing the evolution of a quantum state $|y\rangle$. Then $U_1 \otimes U_2$ describes the evolution of the quantum system $|x\rangle\otimes |y\rangle$. Note that to build a state in $|v\rangle \in \mathcal{H}^n$ we need $\lceil \log n\rceil$ qubits, and this fact will be extensively leveraged in our quantum algorithms.

2.2.1 Review of important statements in quantum computation

Before delving into a review of quantum algorithms, we would like to state here a few important lemmas.

Lemma 2.1 (Hadamard on a bitstring (Nielsen and Chuang 2002)) Let $x \in \{0,1\}^n$ be a bitstring, and $H$ be an Hadamard gate. Then:

\[H^{\otimes n}|x\rangle = \frac{1}{\sqrt{2^n}}\sum_{z_1, \dots z_n \in \{0,1\}^n} (-1)^{x_1z_1 + x_2z_2 + \dots + x_nz_n} |z_1, \dots, z_n\rangle = \frac{1}{\sqrt{2^n}}\sum_{z\in \{0,1\}^n} (-1)^{x^Tz} |z\rangle\] Where $x^Tz$ is the bitwise inner product of strings in $Z_2^n$ modulo $2$.

2.3 Measuring complexity of quantum algorithms

This section is an attempt to organize in a coherent way some fundamental concepts in quantum computer science. The formalization of some of these concepts comes from (Dörn 2008) and (De Wolf 2019).

There are various ways to measure the complexity of a quantum algorithm. We denote with $T(U)$ the time complexity needed to implement $U$, measured in terms of number of gates of the circuit. This is a concept that bears some similarity with the clock rate of classical CPUs.

We use a standard notation of $\widetilde{O}$ for hiding polylogarithmic factors in the big-O notation of the algorithms: $O(\log(n))=\widetilde{O}(1)$.

Definition 2.2 (Quantum query or oracle access to a function) Let $\mathcal{H}$ be a finite-dimensional Hilbert space with basis $\{0,1\}^{n}$. Given $f:\{0,1\}^n\to\{0,1\}^m$, we say that we have quantum query access to $f$ if we have access to a unitary operator $U_f$ on $\mathcal{H}\otimes\mathbb{C}^{2^n}$ such that $U|x\rangle|b\rangle = |x\rangle|b \oplus f(x)\rangle$ for any bit string $b\in\{0,1\}^m$. One application of $U_f$ costs T_f operations$.

Definition 2.3 (Quantum computation in the query model) Let $O_x$ be a unitary operator that encodes the input of our computation, and acts in a non-trivial way on its associated Hilbert space. A quantum computation with $T$ queries to an oracle $O_x : |i,b,z\rangle \mapsto |i,b \oplus x_i, z\rangle$ is a sequence of unitary transformations:

\[U_0O_xU_1O_x \dots U_{T-1}O_xUT\] Where each $U_t$ is a unitary that does not depend on the input of the algorithm. The output of the computation is obtained by measuring the rightmost register (by convention).

Note that the second register holds the XOR of the $i$-th component of the input with the previous state of the register (i.e. the b). This is to make the computation reversible. Importantly, the definition 2.2 is just an example of function for which we can have query access. We can assume query access to unitaries creating various kind of quantum states as output. We will see many examples of oracles as definition 2.9, 2.11, 2.12, and 2.13.

This is the so-called query model, or oracle model of computation. The important thing here is the last statement of 2.2 about the cost of applying $U_f$ is $O(1)$. There are multiple reasons for working in this model. First, it is often the case that queries to these oracles are actually efficient (as we will see in many example), so the query complexity is actually equivalent (up to multiplicative polylogarithmic factors) to the depth of the quantum circuit that is going to be executed. Another reason is that in the oracle model is relatively simple to prove lower bounds and results about the complexity of an algorithm in terms of the number of queries to an oracle that encodes the input of the problem. It is customary, for complex results in quantum algorithms to separate the study of the query complexity of the problem and the depth of the quantum circuit which is executed on the real hardware. We formalize more this difference in the following definitions.

Definition 2.4 (Query complexity) The quantum query complexity of a quantum algorithm $\mathcal{A}$ is the number of queries to a black-box made by $\mathcal{A}$ in order to compute $f$.

If we just care about the relativized complexity, we might limit ourselves to compare two algorithms that solve the same problem in terms of the number of queries to a given oracle, we might observe that one is faster than the other. This is a relativized speedup. The oppositive is an absolute speedup, i.e. when we also take into account the complexity of the operations that are not queries to an oracle. In the case of quantum algorithms, these might simply be the gate depth of the circuit.

Definition 2.5 (Circuit complexity or time complexity) The quantum circuit complexity (or time complexity) of a quantum algorithm $\mathcal{A}$ is the depth of the quantum circuit implementing $\mathcal{A}$.

Quantum computing is not the only place where we measure the complexity in terms of query to an oracle. In fact, it’s sufficient to do a few “queries” (pun intended) on your search engine to realize that in many computational models we have adopted this measure of computational complexity.

Note that the query complexity of an algorithm is a lower bound on the gate complexity of the quantum circuit. It is often simpler to study first the query complexity of a quantum algorithm and then study the time complexity. For most quantum algorithms (but not all!) the time complexity coincides with the query complexity, up to a logarithmic factor. Note that, if we find a way to have an oracle whose depth (i.e. circuit complexity) is only (poly)logarithmic in the input size, then the query complexity and the gate complexity coincide up to a negligible polylogarithmic factor. There are some exceptions. Most notably, there is a quantum algorithm for the important hidden subgroup problem with only polynomial query complexity, while the classical counterpart has a query complexity that is exponential in the input size. Nevertheless, the overall time complexity of the quantum algorithm is (to date) still exponential, and polynomial-time quantum algorithms are known only for a few specializations of the problem.

We will clarify better some definitions that are used to describe the probabilistic behavior of an algorithm:

Definition 2.6 (Kind of randomized algorithms) Let $f : \{0,1\}^N \mapsto \{0,1\}$ be a Boolean function. An algorithm computes $f$:

exactly if the outputs equals $f(x)$ with probability 1 for all $x \in\{0,1\}^N$
with zero error if it is allowed to give the answer “UNDEFINED” with probability smaller than $1/2$ (but if the output is $0$ or $1$ it must be correct)
with bounded error if the output equals $f(x)$ with probability greater than $2/3$ for all $x\in \{0,1\}^N$.

A bounded error (quantum or classical) algorithm that fails with probability $1/3$ (or any other constant smaller than $1/2$) is meant to fail in the worst-case. We do not expect the algorithm to fail in the average case, i.e. for most of the inputs (see Appendix of (De Wolf 2019)).

If a (quantum or classical) algorithm is said to output the right answer in expected (oftain said “in expectation”) running time $T$, we can quickly create another algorithm that has worst-case guarantees on the runtime. This is obtained using the Markov’s inequality, i.e. theorem C.2 as follows. Run the algorithm for $kT$ steps, i.e.. stop the execution after $kT$ steps if it hasn’t terminated already. If $X$ is the random variable of the runtime of the computation (so $\mathbb{E}[X]=T$), then:

\[Pr\left[X > kT \right] \leq \frac{1}{k} \] So with probability $\geq 1-\frac{1}{k}$ we will have the output of the algorithm.

2.4 Review of famous quantum algorithms

In this chapter we will explore some introductory quantum algorithms. Some of them are not particularly related to data analysis or machine learning, but given their potential to help us better understand the model of quantum computation that we adopt, we decided it was important to report them here. Others will prove to be really useful subroutines for the quantum machine learning practitioner.

2.4.1 Deutsch-Josza

Definition 2.7 (Constant function) A function $f :\{0,1\}^n \mapsto \{0,1\}$ is constant if $f(x)=0 \forall x \in \{0,1\}^n$ or $f(x)=1 \forall x \in \{0,1\}^n$.

Definition 2.8 (Balanced function) A function $f :\{0,1\}^n \mapsto \{0,1\}$ is balanced if $f(x)=0$ for half of the inputs and $f(x)=1$ for the other half.

Theorem 2.1 (Deutsch-Josza (Deutsch and Jozsa 1992)) Assume to have quantum access (as definition 2.2 ) to a unitary $U_f$ that computes the function $f :\{0,1\}^n \mapsto \{0,1\}$, which we are promised to be either constant or balanced. There is a quantum algorithm that decides which is the case with probabiliy $1$, using $U_f$ only once and using $O(\log(n))$ other gates.

Proof. We start our quantum computer initializing $n$ qubit as $|0\rangle$ state follwed by a single ancilla qubit initialized in state $|1\rangle$, which we will use for the phase-kickback. Then, we apply the Hadamard transform on each of them. Mathematically, we are performing the following mapping:

\[\begin{equation} |0\rangle^{\otimes n}|1\rangle \mapsto \left(\frac{1}{\sqrt{2^n}}\sum_{x\in\{0,1\}^n} |x\rangle \right)|-\rangle \end{equation}\] Now we apply $U_f$ using the first register as input and the ancilla register as output. Our quantum computer is now in the state \[\left(\frac{1}{\sqrt{2^n}}\sum_{x\in\{0,1\}^n}(-1)^{f(x)}|x\rangle \right)|-\rangle\] Now we apply $n$ Hadamard gates to the $n$ qubits in the first registers. Recalling lemma 2.1, this gives the state

\[\left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}(-1)^{f(x)} \sum_{j \in \{0,1\}^n }(-1)^{xj} |j\rangle \right) |+\rangle = \left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}\sum_{j \in \{0,1\}^n} (-1)^{f(x)+ xj} |j\rangle \right)|+\rangle\]

In this state, note that the normalization factor has changed from $\frac{1}{\sqrt{2^n}}$ to $\frac{1}{2^n}$, and recall that $(-1)^{xj}$ is read as $(-1)^{ \sum_{p} x_pj_p \text{mod}2 }$. The key idea of the proof of this algorithm lies in asking the right question to the previous state: what is the probability of measuring the state $|0\rangle^n$ in the first register? The answer to this question will conclude the proof of this theorem. Before looking at the probability, observe that the amplitude of the state $|j=0\rangle$ we will see that it is just $\frac{1}{2^n}\sum_{x}(-1)^{f(x)}$, as $x^Tj=0$ if $j=0_1\dots 0_n$, for all $x$. Then,

\[\begin{equation} \frac{1}{2^n} \sum_{i \in \{0,1\}^n } (-1)^f(x) = \begin{cases} 1 & \text{if } f(x)=0 \forall x \\ -1 & \text{if } f(x)=1 \forall x \\ 0 & \text{if } f(x) \text{is balanced} \end{cases} \end{equation}\]

To conclude, reckon that if the function $f$ is constant (first two cases), we will measure $|0\rangle^{\otimes n}$ with probability $1$, and if the function is balanced, we will measure some bitstring of $n$ bits that is different than the string $0_1\dots 0_n$.

It’s simple to see that if we want to solve this problem with a classical deterministic algorithm, we need exactly $2^n/2 + 1$ queries. However, with the usage of a randomized algorithm we can drastically reduce the number of queries by admitting a small probability of failure.

Exercise 2.2 Can you think of an efficient randomized classical algorithm for solving this problem? Perhaps you can use the tools in the Appendix for randomized algorithms.

We now turn our attention to the first learning problem of this book. This is rarely stressed that the following algorithm can be interpreted as a learning algorithm.

2.4.2 Bernstein-Vazirani

Theorem 2.2 (Bernstein-Vazirani) Assume to have quantum access (as definition 2.2 ) to a unitary $U_f$ that computes the function $f :\{0,1\}^n \mapsto \{0,1\}$, which computes $f_a(x) = (x,a) = ( \sum_i^n x_i a_i )\mod 2$ for a secret string $a \in \{0,1\}^n$. There is a quantum algorithm that learns $a$ with probability $1$, using $U_f$ only once and $O(\log(n))$ other gates.

Proof. The algorithm follows exactly the same steps as the Deutsch-Josza algorithm. The proof is slightly different, and start by noting that, after the application of the oracle $U_f$, the register of our quantum computer is in the following state:

\[\left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}(-1)^{f(x)} |x\rangle \right) |+\rangle = \left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n} (-1)^{a^T x}|x\rangle \right)|+\rangle\]

Now we resort again to Lemma (lem:hadamard-on-bitstring), and we use the fact that the Hadamard it is also a self-adjoint operator (i.e. it is the inverse of itself: $H^2 = I$). Thus applying $n$ Hadamard gates to the first register leads to the state $|a\rangle$ deterministically.

Exercise 2.3 Can you think of an efficient randomized classical algorithm for solving Berstain-Vazirani problem? You can use the tools in the Appendix for randomized algorithms.

Other material for learning about Deutsch-Josza and Bernstein-Vazirani algorithms are the lecture notes of Ronald de Wolf that you can find here.

2.4.3 Swap test

The next algorithm was originally proposed in (Buhrman et al. 2001), in the context of quantum fingerprinting, but it has been quickly extended to many other context. For us, the swap test is a way to obtain an estimate of an inner product between two quantum states.

Theorem 2.3 (Swap test (no amplitude amplification)) Assume to have access to a unitary $U_1$ that produces a state $U_1 |0\rangle = |\psi_1\rangle$ and a unitary $U_2$ that produces a state $|\psi_2\rangle$, where $|\psi_1\rangle,|\psi_2\rangle \in \mathbb{R}^N$ for $N=2^n, n\in\mathbb{N}$. There is a quantum algorithm that allows to estiamte the quantity $|\langle\psi_1|\psi_2\rangle|^2$ with additive precision $\epsilon$ using $U_1$ and $U_2$ $O(\frac{\log(1/\delta)}{\epsilon_2})$ times with probability $1-\delta$

Proof. Create a state $|0\rangle|0\rangle|0\rangle$ where the first register is just an ancilla qubit, and the second and third register have $n$ qubits. Then, apply an Hadamard gate to the first qubit, so to obtain $|+\rangle|0\rangle|0\rangle$. Then, apply $U_1$ and $U_2$ to the second and third register, and then apply a controlled swap gate controlled on the ancilla qubit, targeting the two registers. More precisely, we apply $n$ controlled swap gates, each controlling a single qubit of the second and third register. Thus, we obtain the state: \[\frac{1}{\sqrt{2}}\left[|0\rangle(|\psi_1\rangle|\psi_2\rangle) + |1\rangle(|\psi_2\rangle|\psi_1\rangle) \right] \]

we now apply another Hadamard gate on the ancilla qubit, so to obtain the following state:

\[\begin{align} |\phi\rangle=& \frac{1}{\sqrt{2}}\left[\frac{1}{\sqrt{2}}\left(|0\rangle(|\psi_1\rangle|\psi_2\rangle) + |1\rangle(|\psi_1\rangle|\psi_2\rangle)\right) + \frac{1}{\sqrt{2}}\left(|1\rangle(|\psi_2\rangle|\psi_1\rangle) - |0\rangle(|\psi_2\rangle|\psi_1\rangle) \right) \right] \\ =& \frac{1}{2}\left[|0\rangle\left(|\psi_1\rangle|\psi_2\rangle) + |\psi_2\rangle|\psi_1\rangle \right) + |1\rangle\left(|\psi_1\rangle|\psi_2\rangle) - |\psi_2\rangle|\psi_1\rangle\right)\right] \end{align}\]

Now we consider the probability of measuring $0$ and $1$ in the ancilla qubit. More in detail, we want to estimate $p(0)=\langle\phi|M_0|\phi\rangle$. For this, we recall our Postulate 2.3, and more precisely equation (2.1), with $M_0=|0\rangle\langle 0|\otimes I$, where $I$ is the identiy operator over $n$ qubits.

It is simple to see that $p(0)=\frac{2-|2\langle\psi_1|\psi_2\rangle|^2}{4}$.

By repeating this measurement $O(\log(1/\delta)/\epsilon^2)$ times, duly following the statement of the Chernoff bound in theorem C.8, we have that the number of samples needed to obtain an error $\epsilon$ with probability $1-\delta$ is $t=\frac{\log(1/\delta)}{2\epsilon^2}$. Once we have obtained an estimate of $p(0)$, we can estiamte the sought-after quantity of interest as $|\langle\psi_1|\psi_2\rangle|^2 = 1-2p(0)$.

Exercise 2.4 (Obtain the absolute value of the inner product) In the previous theorem we obtain an estimate of $|\langle\psi_1|\psi_2\rangle|^2$ with a certain error $\epsilon$. If we just take the square root of that number, what is the error in the estimation of $|\langle\psi_1|\psi_2\rangle|$? You are encouraged to read the section in the appendix E on error propagation.

2.4.4 Hadamard test

Theorem 2.4 (Hadamard test (no amplitude amplification)) Assume to have access to a unitary $U_1$ that produces a state $U_1 |0\rangle = |\psi_1\rangle$ and a unitary $U_2$ that produces a state $|\psi_2\rangle$, where $|\psi_1\rangle,|\psi_2\rangle \in \mathbb{R}^N$ for $N=2^n, n\in\mathbb{N}$. There is a quantum algorithm that allows to estiamte the quantity $\langle\psi_1|\psi_2\rangle$ with additive precision $\epsilon$ using controlled applications of $U_1$ and $U_2$ $O(\frac{\log(1/\delta)}{\epsilon_2})$ times, with probability $1-\delta$

Proof. Create a state $|0\rangle|0\rangle|0\rangle$ where the first register is just an ancilla qubit, and the second and third register have $n$ qubits. Then, apply an Hadamard gate to the first qubit, so to obtain $|+\rangle|0\rangle|0\rangle$. Then, controlled on the first register being $0$, we apply the unitary $U_1$, and controlled on the register being $1$, we apply the unitary $U_2$. Then, we apply again the Hadamard gate on the ancilla qubit. The state that we obtain is the following:

\[\begin{equation} (H\otimes I \otimes I) \frac{1}{\sqrt{2}}\left( |0\rangle|\psi_1\rangle|0\rangle + |1\rangle|0\rangle|\psi_2\rangle \right) \\ = \frac{1}{2}\left(|0\rangle(|\psi_1\rangle|0\rangle + |0\rangle|\psi_2\rangle) + |1\rangle(|\psi_1\rangle|0\rangle - |0\rangle|\psi_2\rangle) \right) \end{equation}\]

Again, now it is easy to state that the probability of measuring $0$ is:

\[\begin{equation} p(0)=\frac{2+2\langle\psi_1|\psi_2\rangle}{4} \end{equation}\]

We conclude the proof by recalling the Chernoff bound in theorem C.8, as we did for the proof of the swap test.

Can you think of the reasons that might lead one to prefer the swap test over the Hadamard test, or vice versa? At the end of the day, they are both computing the same thing? For instance, note that for the Hadamard test, we are requiring the ability to call the controlled version of the unitaries $U_1$, and $U_2$, while for the swap test, we can just trat them as black-box: these can be quantum states that we obtain from a quantum process, or that we obtain from a quantum communication channel.

2.5 Representing data in quantum computers

Before understanding how to process data with a quantum computer, we need to step back, and understand how we can represent and store data as a quantum state. This problem is of paramount importance, because knowing what is the best way of encoding data in a quantum computer might pave the way for intuitions in solving our problems. On the contrary, using the wrong encoding might prevent you from reasoning about the right algorithm design, and obtaining the desired advantages in the implementation of your algorithm. Indeed, in (Schuld, Sinayskiy, and Petruccione 2015), Schuld and others wrote: In order to use the strengths of quantum mechanics without being confined by classical ideas of data encoding, finding “genuinely quantum” ways of representing and extracting information could become vital for the future of quantum machine learning.

We start by focusing on representing numbers. In this context, we want to build a unitary operator $U$ that performs the following mapping: \[U|i\rangle|0\rangle\to |i\rangle|x_i\rangle\] Where $|x_i\rangle$ is the $i$-th element of our input, which can be a vector or a number, as we will see now. As a convention, throughout this book, we often use Greek letters inside kets to represent generically quantum states, and use Latin letters to represent quantum registers holding classical data. The first is called the index register, and the second is the target register, which after the query to a certain oracle is storing the information you want. Fundamentally, there are just two ways of encoding the information: the amplitude encoding and the binary encoding. In amplitude encoding you store your data in the amplitudes of a quantum state, therefore you can encode $n$ real values (or better, some fixed point precision approximation of a real number, up to some digits of precision) using $O(\log n)$ qubits. In the binary or digital encoding you store a bit in the state of each qubit. Each encoding allows to process the data in different ways, unlocking different possibilities.

2.5.0.1 Scalars: $\mathbb{Z}$, $\mathbb{Q}$, $\mathbb{R}$

A possible way to store these values is through the binary encoding. Let’s start with the most simple scalar: an integer. Let $x \in \mathbb{N}$. We consider the binary expansion of $x$ encoded in $m$ qubits. As example, if your number’s binary expansion is $0100\cdots0111$ we can create the state: $|x\rangle = |0\rangle\otimes |1\rangle \otimes |0\rangle \otimes |0\rangle \cdots |0\rangle \otimes |1\rangle \otimes |1\rangle \otimes |1\rangle$. Formally, for a number $x=x_0, x_1, \dots x_m$ we write:

\[|x\rangle = \bigotimes_{i=0}^{m} |x_i\rangle\] Eventually, we can use one more qubit for the sign. If we were to access more numbers, the unitary reads $U|i\rangle|0\rangle \mapsto |i\rangle|x_i\rangle$.

When programming quantum machine learning algorithms, it is common to use subroutines to perform arithmetic on real numbers. The numbers might come from other quantum procedures (like phase estimation or amplitude amplification) or might come directly from the memory (i.e. like the norm of a vector). These numbers are simply encoded like the one shown above, as a binary expansion of $m$ bits.

This encoding can be used to get speedup in the number of query to an oracle, like in (Kapoor, Wiebe, and Svore 2016). There, the authors encoded a real value representing each of the feature of a machine learning dataset with this encoding. In general, you can easily use this encoding if you aim at getting a speedup in oracle complexity using amplitude amplification and similar techniques, or in an intermediate step of your algorithm where you want to perform arithmetic on some values. As we said in the introduction, the oracle complexity is a measure of the complexity of an algorithm that counts the number of times a given oracle (which in our case is just a unitary) is called. Obviously, the precision you get in storing a number is given by the number of qubits you use in a quantum register.

2.5.0.2 Binary vectors: $\{0,1\}^n$

Let $\vec{b} \in \{0,1\}^n$. For the sake of learning, we can play with three different possibilities. As for the encoding the integers, we can use $m$ qubits with a binary encoding:

\[|b\rangle = \bigotimes_{i=0}^{n} |b_i\rangle\]

As an example, suppose you want to encode the vector $b=[1,0,1,0,1,0] \in \{0,1\}^6$, which is $42$ in decimal. This will correspond to the $42$-th base of the Hilbert space where our qubits are living. Note that, in some sense, we are not “fully” using the $2^n$ dimensional Hilbert space $\mathbb{C}^{2^{n}}$: we are mapping our list of binary vector in vectors of the computational (i.e. canonical) base. Note that, as a consequence of this choice, Euclidean distances in the Hilbert space between points have a different meaning.

Remark (Difference between distances in this binary representation). Consider again $v=[1,0,1,0,1,0]$ and $u=[1,1,1,0,1,1] \in \{0,1\}^6$. It is simple to see that $d(u,v)=\|u-v\|_2= 2.236..$. But $\| |42\rangle - |59\rangle \|_2 = \sqrt{2}$.

Our second encoding that we can imagine is an example of amplitude encoding. We can either map:

a $0$ into a $0$, and $1$ as $1$ (or $-1$)
a $0$ into a $1$, and $1$ as $1$ (or $-1$) (You can, but why would you do that?)

\[\begin{equation} |b\rangle = \frac{1}{\sqrt{\sum_{i=0}^n(-1)^{b_i}}} \sum_{i \in \{0,1\}^n} (-1)^{b_i} |i\rangle \tag{2.2} \end{equation}\]

Or, eitherway:

\[|b\rangle = \frac{1}{\sqrt{\sum_{i=0}^nb_i}} \sum_{i=0}^n b_i|i\rangle \] Note that these two last binary encodings are using the amplitudes, so we are using $\lceil \log n\rceil$ qubits. In the last two equations, the normalization factor is just the $\ell_2$ norm of the vector (which in this case is just the square root of the Hamming weight).

2.5.0.3 Vectors and matrices: $\mathbb{R}^n$, $\mathbb{R}^{n \times d}$

Here is an example of amplitude encoding, a powerful approach in quantum machine learning algorithms for encoding the dataset. For a vector $\vec{x} \in \mathbb{R}^{2^n}$, we can build:

\[|x\rangle = \frac{1}{{\left \lVert x \right \rVert}}\sum_{i=0}^{N}\vec{x}_i|i\rangle = |\vec{x}|^{-1}\vec{x}\]

Note that to span a space of dimension $N=2^n$, you just need $n$ qubits: we encode each component of the classical vector in the amplitudes of a state vector. Being able to build this family of states enable us to load matrices in memory.

Let $X \in \mathbb{R}^{n \times d}$, a matrix of $n$ vectors of $d$ components. We will encode them using $log(d)+log(n)$ qubits as the states:

\[\frac{1}{\sqrt{\sum_{i=0}^n {\left \lVert x(i) \right \rVert}^2 }} \sum_{i=0}^n {\left \lVert x(i) \right \rVert}|i\rangle|x(i)\rangle\]

Or, put it another way:

\[\frac{1}{\sqrt{\sum_{i,j=0}^{n,d} {\left \lVert X_{ij} \right \rVert}^2}} \sum_{i,j=0}^{n,d} X_{ij}|i\rangle|j\rangle\]

We will see that, if we have a classical description of this vector, and we are allowed to perform some preprocessing before running our algorithm, we can create this state using a 2.9, which we will see better in the next section. In short, a QRAM gives us quantum access to two things: the norm of the rows of a matrix and the rows itself. Formally, we assume to have access to $|i\rangle\mapsto |i\rangle|x(i)\rangle$ and $|i\rangle \mapsto |i\rangle|\left\lVert x(i)\right\rVert\rangle$. Using these unitaries, we can do the following mapping:

\[\sum_{i=0}^{n} |i\rangle |0\rangle \to \sum_{i=0}^n {\left \lVert x(i) \right \rVert}|i\rangle|x(i)\rangle\]

Many example of this encoding can be found in literature, and this text is mostly based on this representation.

2.5.0.4 Graphs

For some kind of problems we can even change the computational model (i.e. we can switch from the gate model to something else). For instance, a graph $G=(V,E)$ be encoded as a quantum state $|G\rangle$ such that: \[K_G^V|G\rangle = |G\rangle \forall v \in V\] where $K_G^v = X_v\prod_{u \in N(v)}Z_u$, and $X_u$ and $Z_u$ are the Pauli operators on $u$. Here is the way of picturing this encoding: Take as many qubits in state $|+\rangle$ as nodes in the graph, and apply controlled $Z$ rotation between qubits representing adjacent nodes. There are some algorithms that use this state as input. For a more detailed description, look at (Zhao, Pérez-Delgado, and Fitzsimons 2016), which also gives very nice algorithms that might be useful in some graph problem.

2.5.0.5 Remarks on data representation

The precision that we can use for specifying the amplitude of a quantum state might be limited - in practice - by the precision of our quantum computer in manipulating quantum states (i.e. development in techniques in quantum metrology and sensing). Techniques that use a certain precision in the amplitude of a state might suffer of initial technical limitations of the hardware. The precision in the manipulation can be measured by the fidelity.

2.6 Loading data: quantum memory models

Along with a fully fledged quantum computer, we also assume to have access to a quantum memory, i.e. a classical data structure that store classical information, but that is able to answer queries in quantum superposition. This model is commonly called the . As we will see in greater details soon, the task of building the data structure classically requires time that is linear (up to polylogarithmic factors) in the dimension of the data. This observation is better detailed in definition 2.10. For instance, if we want to have quantum access to a dense matrix $M \in \mathbb{R}^{n \times d}$ the preprocessing runtime will be $O(nd \log (nd))$. To stress more the fact that we are linear in the effective number of elements contained in the matrix (which can often be sparse) can write $\tilde{O}(\left\lVert A\right\rVert_0)$. The name QRAM is meant to evoke the way classical RAM address the data in memory using a tree structure. In the data structure one can write down the real entries $m_{ij}$ with some precision $\delta$ using $\log 1/\delta$ bits.

Note that sometimes, QRAM goes under the name of QROM, as actually it is not something that can be written during the runtime of the quantum algorithm, but just queried, i.e. read. Furthermore, a QRAM is said to be efficient if can be updated by adding, deleting, or modifying an entry in polylogarithmic time w.r.t the size of the data it is storing. Using the following definition, we can better define the computational model we are working with.

Definition 2.9 (Quantum Random Access Memory (Giovannetti, Lloyd, and Maccone 2008)) A quantum random access memory is a device that stores indexed data $(i,x_i)$ for $i \in [n]$ and $x_i \in \mathbb{R}$ (eventually truncated with some bits of precision). It allows query in the form $|i\rangle|0\rangle\mapsto |i\rangle|x_i\rangle$, and has circuit depth $O(polylog(n))$.

We say that a dataset is efficiently loaded in the QRAM, if the size of the data structure is linear in the dimension and number of data points and the time to enter/update/delete an element is polylogarithmic in the dimension and number of data points. A direct application of this model is what is commonly assumed in quantum algorithms to access sparse matrices. This is often useful when dealing with graphs, or simply with sparse Hamiltonians.

Equipped with this definition - which we will not discuss in depth, but it’s discussed here (Giovannetti, Lloyd, and Maccone 2008) - allows us to load all kind of data in the quantum computer.

Definition 2.10 (QRAM model (Kerenidis and Prakash 2020)) An algorithm in the QRAM data structure model that processes a data-set of size $m$ has two steps:

A pre-processing step with complexity $\widetilde{O}(m)$ that constructs efficient QRAM data structures for storing the data.
A computational step where the quantum algorithm has access to the QRAM data structures constructed in step 1.

The complexity of the algorithm in this model is measured by the cost for step 2.

When we say that we have “oracular access” to a matrix (or a vector), we mean the following.

Definition 2.11 (Query access to a matrix) Let $V \in \mathbb{R}^{n \times d}$. There is a data structure to store $V$ such that, a quantum algorithm with access to the data structure can perform $|i\rangle|j\rangle|z\rangle \to |i\rangle|j\rangle|z \oplus v_{ij}\rangle$ for $i \in [n], j \in [d]$.

With the vector stored in a QRAM, we can have efficient (i.e. in time $O(\log(n))$) query access to a matrix or a vector. A matrix can be accessed also in another way.

Definition 2.12 (Oracle access in adjacency list model) Let $V \in \mathbb{R}^{n \times d}$, there is an oracle that allows to perform the mappings:

$|i\rangle\mapsto|i\rangle|d(i)\rangle$ where $d(i)$ is the number of entries in row $i$, for $i \in [n]$, and
$|i,l\rangle\mapsto|i,l,\nu(i,l)\rangle$, where $\nu(i,l)$ is the $l$-th nonzero entry of the $i$-th row of $V$, for $l < d(i)$.

Note that Definition 2.11 is very similar to the definition of the Grover’s oracle that you might know from previous courses in quantum information. It’s important to remember that for Definition 2.11 and 2.12 we might not need a QRAM, as there might be other efficient circuit for performing those mapping. For instance, when working with graphs (remember that a generic weighted and directed graph $G=(V,E)$ can be seen as its adjacency matrix $A\in \mathbb{R}^{|E| \times |E|}$), many algorithms call Definition 2.11 vertex-pair-query, and the two mappings in Definition 2.12 as degree query and neighbor query. When we have access to both queries, we call that quantum general graph model (Hamoudi and Magniez 2018). This is usually the case in all the literature for quantum algorithms for Hamiltonian simulation, graphs, or algorithms on sparse matrices. These query models are sometimes called in literature “sparse access” to a matrix, as sparsity is often the key to obtain an efficient circuit for encoding such structures without a QRAM. Note that Definition 2.12, sometimes goes under the name “Adjacency array model,” so to stress the difference between the classical adjacency list (where to get the last element o the list we need to read all the previous element) and the quantum case, where we can index the nonzero elements in a row of a matrix as in an array (Dürr et al. 2004).

In the PhD thesis of Prakash (Prakash 2014), he leveraged the definition of 2.9 to allow us to efficiently create superpositions corresponding to the rows of the matrices, i.e. encoding the values of the components of a matrix’ row in the amplitudes of a quantum state. Note that this data structure, which sometimes goes under the name KP-trees (Rebentrost and Lloyd 2018), assumes and extends definition 2.9. In practice, both are called QRAM, and both rely on two (different) tree data structure for their construction. One is a hardware tree that allows to perform the mapping in 2.9, the other is a tree that stores the partial norms of the rows of the matrix, which we will discuss later. We will use the following as definition, but this is actually a theorem. For the proof, we refer to (Prakash 2014) and the appendix of (Kerenidis and Prakash 2017b).

Definition 2.13 (Quantum access to matrices using KP-trees (Kerenidis and Prakash 2017b)) Let $V \in \mathbb{R}^{n \times d}$, there is a data structure - called KP-trees - to store the rows of $V$ such that,

The time to insert, update or delete a single entry $v_{ij}$ is $O(\log^{2}(n))$.
A quantum algorithm with access to the data structure can perform the following unitaries in time $T=O(\log^{2}n)$. \[|i\rangle|0\rangle \to |i\rangle|v_{i}\rangle \forall i \in [n].\] \[|0\rangle \to \sum_{i \in [n]} \left\lVert v_{i}\right\rVert|i\rangle.\]

We report what our QRAM data structure looks like for an input matrix $X$ according to the original definitions in (Prakash 2014, KP16). Each row of the matrix of the dataset is encoded as a tree, where the leaves correspond to the squared values of the matrix elements (along with their sign), while the intermediate nodes store the sum of the sub-tree rooted in each node. Then, in the quantum algorithm we assume to have quantum access to all the nodes of the tree. In ML is common to pre-process the data, with either a polynomial expansion, normalization or scaling of the components, for instance in a way such that each feature has unit variance and zero mean. These model suits perfectly these needs, as these operations can be done before storing the data in the quantum accessible data structure.

Recently, the data structure has been extended to allow space for some improvements in the algorithms. In fact, let $A/\mu = P \circ Q$ a decomposition of the matrix $A$, where the norm of the rows of $P$ and the columns of $Q$ are at most $1$, and the symbol $\circ$ is the Hadamard product. In the original formulation, the factorization chosen corresponded to a choice of $\mu=\|A\|_F$. If there is a quantum circuit allowing creation of quantum states proportional to the rows and the columns of $P$ and $Q$, the runtime of the quantum algorithms based on this improved QRAM data structure can become a function of $\mu$, which can be smaller than the Frobenius norm of $A$. In (Kerenidis and Prakash 2020), they provided such efficient factorization for various choices of $\mu$. In the following we explicitly define a class of functions $\mu$, parameterized by $p \in [0,1]$, that will prove to be very useful in governing the runtime of the quantum algorithms.

Definition 2.14 (Possible parameterization of μ for the KP-trees) For $s_{p}(A) = \max_{i \in [n]} \sum_{j \in [d]} A_{ij}^{p}$, we chose $\mu_p(A)$ to be: \[\mu_p(A)=\min_{p\in [0,1]} (\left\lVert A\right\rVert_{F}, \sqrt{s_{2p}(A)s_{(1-2p)}(A^{T})}).\]

The original definition of QRAM, where $\mu(A)=\|A\|_F$ corresponds to the factorization $A/\|A\|_F = P \circ Q$ where we have $p_{ij} = \frac{a_{ij}}{\left\lVert a_i\right\rVert}$ and $q_{ij}=\frac{\left\lVert a_i\right\rVert}{\left\lVert A\right\rVert_F}$. For the generalized choice of $\mu$ in definition 2.14, it is necessary to store two quantum accessible data structures, that respectively store the rows and the columns of a function of $A$. Instead of storing $a_{ij}$ (along with the sign, which is stored separately), we store $sgn(a_{ij})a_{ij}^p$ and $a^{1-p}_{ij}$. The different terms in the minimum in the definition of $\mu(A)$ correspond to different choices for the data structure for storing $A$. Note that in the worst case, $\mu(A) \leq \left\lVert A\right\rVert_{F} \leq \sqrt{d}$ as we assume that $\left\lVert A\right\rVert=1$. Having a small value for $\mu(A)$ is very important, as this value will appear in the runtime of the quantum algorithms. In this thesis we always assume to have quantum access to matrices which are normalized such that $\left\lVert A\right\rVert \leq 1$.

For details on how to use quantum access to this data structure and proving theorem 2.13, the reader is referred to (Kerenidis and Prakash 2017b) (Appendix) for the original proof, (Kerenidis and Prakash 2020) (theorem 4.4) for details on the choices of $\mu(A)$. More explicit proofs for the creation of quantum states with choices of $\mu$ different than the Frobenius norm can be found in (Chakraborty, Gilyén, and Jeffery 2019) (Lemma 25) and (Gilyén et al. 2019).

To grasp the importance of this model we discuss the hurdles and bottlenecks of doing data analysis on massive datasets. When the data that needs to be processed surpass the size of the available memory, the dataset can only be analyzed with algorithms whose runtime is linear (or at most quadratic) with respect to the size of the dataset. Super-linear computations (like most algorithms based on linear-algebra) are too computationally expensive, as the size of the data is too big to fit in live memory.

Under this settings, quantum computers can offer significant advantages. In fact, the runtime of the whole process of performing a data analysis using quantum computers is given by the time of the preprocessing and constructing quantum access, plus the runtime of the quantum procedure. In practice, we want to write algorithms with a total computational complexity of $\tilde{O}(\left\lVert A\right\rVert_0) + \tilde{O}(poly(\kappa(A), \mu(A), 1/\epsilon,\log(n), \Gamma ))$, where $\kappa(A)$, is the condition number of the matrix, $\epsilon$ the error in the approximation of the calculations, and $\Gamma$ might represent some other parameters that depends on the data, but not on its size. This represents an improvement compared to the runtime of the best classical algorithms in machine learning, which is $O(poly(\left\lVert A\right\rVert_0) \times poly(\kappa(A),1/\epsilon))$. Note that without resorting to randomized linear algebra (as in the case of the dequantizations), these runtimes are lower bounded by the runtime for matrix multiplication and matrix inversion. As the QRAM preparation is computationally easy to implement, (it requires a single or few passes over the dataset, that we can do when we receive it, and it is can be made in parallel) a quantum data analysis can be considerably faster than the classical one. It is clear that, even if the scaling of the quantum algorithm is sub-linear in the data (it is often in fact polylogarithmic in $n$), if we consider in the runtime of the algorithms also the time to build quantum access we ``lose’’ the exponential gap between the classical and the quantum runtime. Nevertheless, the overall computational cost can still largely favour the quantum procedure, as we expect the final runtime of the quantum algorithm to be comparable to the preprocessing time. Moreover, the preprocessing can be made once, when the data is received. For the choice of the data structure that leads to a value of $\mu$ equal to the Frobenius norm of the matrix under consideration, this can be done even on the fly, i.e. while receiving each of the rows of the matrix. For the choice of $\mu$ related to a $p$ norm, the construction of the data structure needs only a few passes over the dataset.

2.7 Retrieving data

In order to retrieve information from a quantum computer, we are going to use some efficient procedures that allow to reconstruct classically the information stored in a quantum state. These procedures can be thought of as cleaver ways of sampling a state $|x\rangle$. The idea for an efficient quantum tomography is that we want to minimize the number of times that the sate $|x\rangle$ is created, and consequently, the number of times we call the process that creates $|x\rangle$.

Much of the current literature in quantum tomography is directed towards reconstructing a classical description of density matrices.

Theorem 2.5 (Efficient quantum tomography (O’Donnell and Wright 2016)) An unknown rank-r mixed state $\rho \in \mathbb{C}^{d \times d}$ can be estimated to error $\epsilon$ in Frobenius distance using $n = O(d/\epsilon^2)$ copies, or to error $\epsilon$ in trace distance using $n=O(rd/\epsilon^2)$ copies.

The quantum algorithms described in this document usually work with pure quantum states. Moreover we assume to have access to the unitary that creates the quantum state that we would like to retrieve, and that we have access to the unitary that creates the state (and that we can control it). Under these conditions, the process of performing tomography is greatly simplified. According to the different error guarantees that we require, we can chose among two procedures.

Theorem 2.6 (Vector state tomography with L2 guarantees (Kerenidis and Prakash 2018)) Given access to unitary $U$ such that $U|0\rangle = |x\rangle$ and its controlled version in time $T(U)$, there is a tomography algorithm with time complexity $O(T(U) \frac{ d \log d }{\epsilon^{2}})$ that produces unit vector $\widetilde{x} \in \mathbb{R}^{d}$ such that $\left\lVert\widetilde{x} - |x\rangle \right\rVert_{2} \leq \epsilon$ with probability at least $(1-1/poly(d))$.

Theorem 2.7 (Vector state tomography with L∞ guarantees (Kerenidis, Landman, and Prakash 2019)) Given access to unitary $U$ such that $U|0\rangle = |x\rangle$ and its controlled version in time $T(U)$, there is a tomography algorithm with time complexity $O(T(U) \frac{ \log d }{\epsilon^{2}})$ that produces unit vector $\widetilde{x} \in \mathbb{R}^{d}$ such that $\left\lVert\widetilde{x} - |x\rangle \right\rVert_{\ell_\infty} \leq \epsilon$ with probability at least $(1-1/poly(d))$.

Note that in both kinds of tomography, dependence on the error in the denominator is quadratic, and this is because of the Hoeffding inequality, i.e. lemma C.2. Another remark on the hypothesis of the algorithms for tomography is that they require a unitary $U$ such that $U|0\rangle\mapsto|x\rangle$ for the $|x\rangle$ in question. Oftentimes, due to the random error in the quantum subroutines used inside the algorithms, this state $|x\rangle$ might slightly change every time. We will see that in this thesis the variance on the state $|x\rangle$ is due to errors in mostly due to the phase estimation procedures. In Section 4.1 we discuss ways of making phase estimation algorithms almost deterministic.

More advanced techniques have been recently developed in (Zhang et al. 2020). There, the authors used the assumption on doing tomography on a state $|x\rangle$ that is in the row space of a rank $r$ matrix $A$ for which we have quantum access. They propose an algorithm to obtain the classical description of the coefficients $x_i$ in the base spanned by the rows $\{A_{i}\}_{i=0}^r$of $A$, so that $|x\rangle = \sum_i^r x_i |A_i\rangle$. This requires $\tilde O(poly(r))$ copies of the output states and $\tilde{O}(poly(r), \kappa^r)$ queries to input oracles. While this procedure has the benefit of not being linear in the output dimension of the final state, the high dependence on the rank might hide the advantages compared to the previous quantum tomography procedures. For completeness, the result is as follows.

Theorem 2.8 (Smart quantum tomography (Zhang et al. 2020)) For the state $|v\rangle$ lies in the row space of a matrix $A \in \mathbb{R}^{n \times d}$ with rank $r$ and condition number $\kappa(A)$, the classical form of $|v\rangle$ can be obtained by using $O(r^3\epsilon^2)$ queries to the state $|v\rangle$, $O(r^{11}\kappa^{5r}\epsilon^{-2}\log(1/\delta))$ queries to QRAM oracles of $A$ and $O(r^2)$ additional inner product operations between rows, such that the $\ell_2$ norm error is bounded in $\epsilon$ with probability at least $1-\delta$.

Other strategies to output data from a quantum computer can be sampling, or (if the output of the computation is just a scalar) amplitude estimation. These techniques can be employed when the output of the computation is a scalar.

Quantum algorithms for data analysis