qiskit-documentation/learning/courses/quantum-machine-learning/data-encoding.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0210ee7c-f989-4eb0-9231-ae5deacb6091",
   "metadata": {},
   "source": [
    "{/* cspell:ignore backgroundcolor linesize, imshow, minpos, nonumber, zzcircuit, zcircuit, pcircuit, zzcx, Schuld, Francesco, Petruccione, Vojtech, Adrian, Perez, Cervera, Lierta, Elies, Fuster, Jose, Latorre, Sweke, Jakob, Adrián, Pérez-Salinas, Alba, Cervera-Lierta, Elies, Gil-Fuster, José */}\n",
    "\n",
    "# Data encoding\n",
    "\n",
    "## Introduction and notation\n",
    "\n",
    "To use a quantum algorithm, classical data must somehow be brought into a quantum circuit. This is usually referred to as data *encoding*, but is also called data *loading*. Recall from previous lessons the notion of a feature mapping, a mapping of data features from one space to another. Just transferring classical data to a quantum computer is a sort of mapping, and could be called a feature mapping. In practice, the built-in feature mappings in Qiskit (like ZFeatureMap and ZZFeatureMap) will typically include rotation layers and entangling layers that extend the state to many dimensions in the Hilbert space. This encoding process is a critical part of quantum machine learning algorithms and directly affects their computational capabilities.\n",
    "\n",
    "Some of the encoding techniques below can be efficiently classically simulated; this is particularly easy to see in encoding methods that yield product states (i.e. which do not entangle qubits). And remember that quantum utility is most likely to lie where the quantum-like complexity of the dataset is well-matched by the encoding method. So it is very likely that you will end up writing your own encoding circuits. Here, we show a wide variety of possible encoding strategies simply so that you can compare and contrast them, and see what is possible. There are some very general statements that can be made about the usefulness of encoding techniques. For example, EfficientSU2 (see below) with a full entangling scheme is much more likely to capture quantum features of data than methods that yield product states  (like ZFeatureMap). But this does not mean EfficientSU2 is sufficient, or sufficiently well-matched to your dataset, to yield a quantum speed-up. That requires careful consideration of the structure of the data being modeled or classified. There is also a balancing act with circuit depth, since many feature maps which fully entangle the qubits in a circuit yield very deep circuits, too deep to get usable results on today's quantum computers.\n",
    "\n",
    "### Notation\n",
    "\n",
    "A dataset is a set of $M$ data vectors: $\\text{X} = \\{\\vec{x}^{(j)}\\,|\\,j\\in [M]\\}$, where each vector is $N$ dimensional, i.e. $\\vec{x}^{(j)}=(\\vec{x}^{(j)}_1,\\ldots,\\vec{x}^{(j)}_N)\\in\\mathbb{R}^N$. This could be extended to complex data features. In his lesson, we may occassionally use these notations for the full set $(\\text{X}),$ and its specific elements like $\\vec{x}^{(j)}$. But we will mostly refer to the loading of a single vector from our dataset at a time, and will often simply refer to a single vector of $N$ features as $\\vec{x}$.\n",
    "\n",
    "Additionally, it is common to use the symbol $\\Phi(\\vec{x})$ to refer to the feature mapping $\\Phi$ of data vector $\\vec{x}$. However, it is also common to refer to mappings in quantum computing using $U(\\vec{x}),$ a notation that reinforces the unitary nature of these operations. One could correctly use the same symbol for both; both are feature mappings. Throughout this course, we tend to use $\\Phi(\\vec{x})$ when discussing feature mappings in machine learning, generally, and $U(\\vec{x})$ when discussing circuit implementations of feature mappings.\n",
    "\n",
    "### Normalization and information loss\n",
    "\n",
    "In classical machine learning, training data features are often \"normalized\" or rescaled which often improves model performance. One common way of doing this is by using min-max normalization or standardization. In min-max normalization, feature columns of the data matrix $\\text{X}$ (say, feature $k$) are normalized:\n",
    "\n",
    "$$\n",
    "x^{'(i)}_k = \\frac{x^{(i)}_k - \\text{min}\\{x^{(j)}_k\\,|\\,\\vec{x}^{(j)}\\in [\\text{X}]\\}}{\\text{max}\\{x^{(j)}_k\\,|\\,\\vec{x}^{(j)}\\in [\\text{X}]\\}-\\text{min}\\{x^{(j)}_k\\,|\\,\\vec{x}^{(j)}\\in [\\text{X}]\\}}\n",
    "$$\n",
    "\n",
    "where min and max refer to the minimum and maximum of feature $k$ over the $M$ data vectors in the dataset $\\text{X}$. All the feature values then fall in the unit interval: $x^{'(i)}_k \\in [0,1]$ for all $i\\in [M]$, $k\\in[N]$.\n",
    "\n",
    "Normalization is also a fundamental concept in quantum mechanics and quantum computing, but it is slightly different from min-max normalization. Normalization in quantum mechanics requires that the length (in the context of quantum computing, the 2-norm) of a state vector $|\\psi\\rangle$ is equal to unity: $\\|\\psi\\|=\\sqrt{\\langle\\psi|\\psi\\rangle} = 1$, ensuring that measurement probabilities sum to 1. The state is normalized by dividing by the 2-norm; that is, by rescaling\n",
    "$$\n",
    "|\\psi\\rangle\\rightarrow\\|\\psi\\|^{-1}|\\psi\\rangle\n",
    "$$\n",
    "In quantum computing and quantum mechanics, this is not a normalization imposed by people on the data, but a fundamental property of quantum states. Depending on your encoding scheme, this constraint may affect how your data are rescaled. For example, in amplitude encoding (see below), the data vector is normalized $\\vert\\vec{x}^{(j)}\\vert = 1$ as is required by quantum mechanics, and this affects the scaling of the data being encoded. In phase encoding, feature values are recommended to be rescaled as $\\vec{x}^{(j)}_i \\in (0,2\\pi]$ so that there is no information loss due to the modulo-$2\\pi$ effect of encoding to a qubit phase angle[\\[1,2\\]](#references)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "930db404-47dc-4dfd-8ca0-cff02fb7cb90",
   "metadata": {
    "formulas": {
     "_ket-dataset": {
      "meaning": "This is a quantum statevector that represents our dataset, 𝒳.",
      "say": "Ket script X"
     },
     "_ket-x": {
      "meaning": "This vertical bar and angled bracket mean we're referring to a <a href='https://en.wikipedia.org/wiki/Bra%E2%80%93ket_notation'>ket</a> (column vector) with label 'x'.",
      "say": "Ket x"
     },
     "_m": {
      "meaning": "This is the number of entries in our dataset."
     },
     "_sum-m": {
      "meaning": "This notation means we add together |x<sup>m</sup>〉 (the m<sup>th</sup> element of our dataset) for all values of m between 1 and M (i.e., the entire dataset).",
      "say": "Sum of all computational basis states in our dataset (𝒳)"
     },
     "_x-lil-m": {
      "meaning": "This is the m<sup>th</sup> element of the dataset. It is an N-dimensional vector. Here, the 'm' is just used to mean \"any number between 1 and M\"",
      "say": "X little-M"
     },
     "brace": {
      "meaning": "These curly brackets (braces) mean everything inside the brackets forms a <a href='https://en.wikipedia.org/wiki/Set_(mathematics)'>set</a>.",
      "say": "Brace (or \"curly bracket\")",
      "type": "Universal notation"
     },
     "ellipsis": {
      "meaning": "These dots omit things where the pattern can be implied.",
      "say": "Ellipsis",
      "type": "Universal notation"
     },
     "in": {
      "meaning": "This symbol means the things to the left of the symbol are contained in the set to the right of the symbol.",
      "say": "In",
      "type": "Universal notation"
     },
     "script-x": {
      "meaning": "This is a symbol we’ve chosen to represent our dataset.",
      "say": "Script X",
      "type": "Locally defined variable"
     }
    }
   },
   "source": [
    "## Methods of encoding\n",
    "\n",
    "In the next few sections, we will refer to a small example classical dataset $\\text{X}_\\text{ex}$ consisting of $M=5$ data vectors, each with $N=3$ features:\n",
    "\n",
    "$$\n",
    "\\text{X}_{\\text{ex}}=\\{(4,8,5),(9,8,6),(2,9,2),(5,7,0),(3,7,5)\\}\n",
    "$$\n",
    "\n",
    "In the notation introduced above, we might say the $1^\\text{st}$ feature of the $4^\\text{th}$ data vector in our set $\\text{X}_{\\text{ex}}$ is $\\vec{x}^{(4)}_1 = 5,$ for example."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "288c73ec-bdfe-4879-b918-0aed7f0c3c5c",
   "metadata": {},
   "source": [
    "### Basis encoding\n",
    "\n",
    "Basis encoding encodes a classical $P$-bit string into a computational basis state of a $P$-qubit system. Take for example $\\vec{x}^{(1)}_3 = 5 = 0(2^3)+1(2^2)+0(2^1)+1(2^0).$ This can be represented as a $4$-bit string as $(0101)$, and by a $4$-qubit system as the quantum state $|0101\\rangle$. More generally, for a $P$-bit string: $\\vec{x}^{(j)}_k = (b_1, b_2, ... , b_P)$, the corresponding $P$-qubit state is $|x^{(j)}_k\\rangle = | b_1, b_2, ... , b_P \\rangle$ with $b_n \\in \\{0,1\\}$ for $n = 1 , \\dots , P$. Note that this is just for a single feature.\n",
    "\n",
    "If each feature of this data vector is mapped to a quantum state $|x^{(j)}_k\\rangle$, then we can describe a data vector from our set as a superposition of all the computational basis states describing the features of that vector:\n",
    "\n",
    "$$\n",
    "|x^{(j)} \\rangle = \\frac{1}{\\sqrt{N}}\\sum_{k=1}^{N}|x^{(j)}_k \\rangle\n",
    "$$\n",
    "\n",
    "In Qiskit, once we calculate what state will encode our data point, we can use the `initialize` function to prepare it. Consider the 4th data vector in our dataset $\\vec{x}^{(4)} = (5,7,0)$. We have $x^{(4)}_1=101, x^{(4)}_2=111$, and $x^{(4)}_3 = 000$. This is encoded as the state $|x^{(4)}\\rangle= \\frac{1}{\\sqrt{3}}(|101\\rangle+|111\\rangle+|000\\rangle)$.\n",
    "\n",
    "We can generate a circuit that will prepare this state using `initialize`. For this specific case, we will use three qubits. The space of all $2^3$ measurable states of these three qubits is spanned by\n",
    "\n",
    "$$\n",
    "\\vert 000\\rangle, \\vert 001\\rangle, \\vert 010\\rangle, \\vert 011\\rangle, \\vert 100\\rangle, \\vert 101\\rangle, \\vert 110\\rangle, \\vert 111\\rangle\n",
    "$$\n",
    "\n",
    "When specifying the desired state of our 3-qubit system, we specify the amplitude of each of these $2^3$ basis states, in this order. Thus, our desired state will have $1 /\\sqrt{3}$ in the $1^\\text{st}$, $6^\\text{th}$, and $8^\\text{th}$ entries, and zeros everywhere else."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85ee995f-1e50-4860-a24c-16bbc8b5c8b0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/85ee995f-1e50-4860-a24c-16bbc8b5c8b0-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import math\n",
    "from qiskit import QuantumCircuit\n",
    "\n",
    "desired_state = [1 / math.sqrt(3), 0, 0, 0, 0, 1 / math.sqrt(3), 0, 1 / math.sqrt(3)]\n",
    "\n",
    "qc = QuantumCircuit(3)\n",
    "qc.initialize(desired_state, [0, 1, 2])\n",
    "qc.decompose(reps=8).draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b96f0bfc-a892-4f68-84a6-1859381f099d",
   "metadata": {},
   "source": [
    "This example illustrates a couple of disadvantages of basis encoding. While it is simple to understand, the state vectors can become quite sparse, and schemes to implement it are usually not efficient."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "887174c0-6a8b-438d-84cc-80451352d9e9",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "### Example\n",
    "\n",
    "Write code to encode the first vector in our example data set $\\text{X}_{\\text{ex}}$:\n",
    "\n",
    "$$\\vec{x}^{(1)}=(4,8,5)$$\n",
    "\n",
    "using basis encoding.\n",
    "\n",
    "__Solution:__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "92450eb4-8737-49ed-b344-b1e82a813fea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 0, 0, 0, 0.5773502691896258, 0.5773502691896258, 0, 0, 0.5773502691896258, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/92450eb4-8737-49ed-b344-b1e82a813fea-1.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import math\n",
    "from qiskit import QuantumCircuit\n",
    "\n",
    "desired_state = [\n",
    "    0,\n",
    "    0,\n",
    "    0,\n",
    "    0,\n",
    "    1 / math.sqrt(3),\n",
    "    1 / math.sqrt(3),\n",
    "    0,\n",
    "    0,\n",
    "    1 / math.sqrt(3),\n",
    "    0,\n",
    "    0,\n",
    "    0,\n",
    "    0,\n",
    "    0,\n",
    "    0,\n",
    "    0,\n",
    "]\n",
    "\n",
    "print(desired_state)\n",
    "\n",
    "qc = QuantumCircuit(4)\n",
    "qc.initialize(desired_state, [0, 1, 2, 3])\n",
    "qc.decompose(reps=7).draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c47289a9-64f0-4541-b25c-41b2a51f86ad",
   "metadata": {
    "formulas": {
     "_a-norm": {
      "meaning": "This is a normalization constant. We can calculate it from the inverse of the Euclidean (l2) norm of the datapoint vector.",
      "say": "A norm"
     }
    }
   },
   "source": [
    "### Amplitude encoding\n",
    "\n",
    "Amplitude encoding encodes data into the amplitudes of a quantum state. It represents a normalized classical $N$-dimensional data vector, $\\vec{x}^{(j)}$, as the amplitudes of a $n$-qubit quantum state, $|\\psi_x\\rangle$:\n",
    "\n",
    "$$\n",
    "|\\psi^{(j)}_x\\rangle = \\frac{1}{\\alpha}\\sum_{i=1}^N x^{(j)}_i |i\\rangle\n",
    "$$\n",
    "\n",
    "where $N$ is the same dimension of the data vectors as before, $\\vec{x}^{(j)}_i$ is the $i^{th}$ element of $\\vec{x}^{(j)}$ and $|i\\rangle$ is the $i^{th}$ computational basis state. Here, $\\alpha$ is a normalization constant to be determined from the data being encoded. This is the normalization condition imposed by quantum mechanics:\n",
    "\n",
    "$$\n",
    "\\sum_{i=1}^N \\left|x^{(j)}_i\\right|^2 = \\left|\\alpha\\right|^2.\n",
    "$$\n",
    "\n",
    "In general, this is a different condition than the min/max normalization used for each feature across all data vectors. Precisely how this is navigated will depend on your problem. But there is no way around the quantum mechanical normalization condition above.\n",
    "\n",
    "In amplitude encoding, each feature in a data vector is stored as an amplitude of a different quantum state. As a system of $n$ qubits provides $2^n$ amplitudes, amplitude encoding of $N$ features requires $n \\ge \\mathrm{log}_2(N)$ qubits.\n",
    "\n",
    "As an example, let's encode the first vector in our example dataset $\\text{X}_\\text{ex}$, $\\vec{x}^{(1)} = (4,8,5)$ using amplitude encoding. Normalizing the resulting vector, we get:\n",
    "\n",
    "$$\n",
    "\\sum_{i=1}^N \\left|x^{(1)}_i\\right|^2 = 4^2+8^2+5^2 = 105 = \\left|\\alpha\\right|^2 \\rightarrow \\alpha = \\sqrt{105}\n",
    "$$\n",
    "\n",
    "and the resulting 2-qubit quantum state would be:\n",
    "\n",
    "$$\n",
    "|\\psi(\\vec{x}^{(1)})\\rangle = \\frac{1}{\\sqrt{105}}(4|00\\rangle+8|01\\rangle+5|10\\rangle+0|11\\rangle)\n",
    "$$\n",
    "\n",
    "In the example above, the number of features in the vector $N=3$, is not a power of 2. When $N$ is not a power of 2, we simply choose a value for the number of qubits $n$ such that $2^n\\geq N$ and pad the amplitude vector with uninformative constants (here, a zero).\n",
    "\n",
    "Like in basis encoding, once we calculate what state will encode our dataset, in Qiskit we can use the `initialize` function to prepare it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "19810c6d-8d60-49ee-bd6f-6f6fbd5e7363",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/19810c6d-8d60-49ee-bd6f-6f6fbd5e7363-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "desired_state = [\n",
    "    1 / math.sqrt(105) * 4,\n",
    "    1 / math.sqrt(105) * 8,\n",
    "    1 / math.sqrt(105) * 5,\n",
    "    1 / math.sqrt(105) * 0,\n",
    "]\n",
    "\n",
    "qc = QuantumCircuit(2)\n",
    "qc.initialize(desired_state, [0, 1])\n",
    "\n",
    "qc.decompose(reps=5).draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58509d89-68ba-4fd5-92b9-278c47497eb9",
   "metadata": {},
   "source": [
    "An advantage of amplitude encoding is the aforementioned requirement of only $\\mathrm{log}_2(N)$ qubits to encode. However, subsequent algorithms must operate on the amplitudes of a quantum state, and methods to prepare and measure the quantum states tend not to be efficient."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "639dd02e-28ad-4d82-8091-ddc00d066666",
   "metadata": {},
   "source": [
    "### Example\n",
    "\n",
    "Write down the normalized state for encoding the following vector (made of two vectors from our example dataset): $\\vec{x}=(9,8,6,2,9,2)$ using amplitude encoding.\n",
    "\n",
    "__Solution:__\n",
    "\n",
    "To encode 6 numbers, we will need to have at least 6 available states on whose amplitudes we can encode. This will require 3 qubits. Using an unknown normalization factor $\\alpha$, we can write this as:\n",
    "\n",
    "$$\n",
    "|\\psi\\rangle = \\alpha(9|000\\rangle+8|001\\rangle+6|010\\rangle+2|011\\rangle+9|100\\rangle+2|101\\rangle+0|110\\rangle+0|111\\rangle)\n",
    "$$\n",
    "Note that\n",
    "$$\n",
    "\\langle \\psi|\\psi\\rangle = |\\alpha|^2\\times(9^2+8^2+6^2+2^2+9^2+2^2+0^2+0^2) = |\\alpha|^2\\times(270)=1 \\rightarrow \\alpha = \\frac{1}{\\sqrt{270}}\n",
    "$$\n",
    "So finally,\n",
    "$$\n",
    "|\\psi\\rangle = \\frac{1}{\\sqrt{270}}(9|000\\rangle+8|001\\rangle+6|010\\rangle+2|011\\rangle+9|100\\rangle+2|101\\rangle+0|110\\rangle+0|111\\rangle)\n",
    "$$\n",
    "\n",
    "\n",
    "### Example\n",
    "\n",
    "For the same data vector $\\vec{x}=(9,8,6,2,9,2),$ write code to create a circuit that loads these data features using amplitude encoding.\n",
    "\n",
    "__Solution:__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9ccad55c-4f5d-42a8-9c95-6f7220dd500d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.5477225575051662, 0.48686449556014766, 0.36514837167011077, 0.12171612389003691, 0.5477225575051662, 0.12171612389003691, 0, 0]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/9ccad55c-4f5d-42a8-9c95-6f7220dd500d-1.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "desired_state = [\n",
    "    9 / math.sqrt(270),\n",
    "    8 / math.sqrt(270),\n",
    "    6 / math.sqrt(270),\n",
    "    2 / math.sqrt(270),\n",
    "    9 / math.sqrt(270),\n",
    "    2 / math.sqrt(270),\n",
    "    0,\n",
    "    0,\n",
    "]\n",
    "\n",
    "print(desired_state)\n",
    "\n",
    "qc = QuantumCircuit(3)\n",
    "qc.initialize(desired_state, [0, 1, 2])\n",
    "qc.decompose(reps=8).draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "950212b6-3ce5-4473-acff-3ac1871ccc1b",
   "metadata": {},
   "source": [
    "### Example\n",
    "\n",
    "You may need to deal with very large data vectors. Consider the vector\n",
    "\n",
    "$$\n",
    "\\vec{x}=(4,8,5,9,8,6,2,9,2,5,7,0,3,7,5).\n",
    "$$\n",
    "\n",
    "Write code to automate the normalization, and generate a quantum circuit for amplitude encoding.\n",
    "\n",
    "__Solution:__\n",
    "\n",
    "There are many possible answers. Here is code that prints a few steps along the way:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d39e2c8-cf58-4247-bbaa-891811fb9800",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Normalized array:\n",
      "[0.17342199 0.34684399 0.21677749 0.39019949 0.34684399 0.26013299\n",
      " 0.086711   0.39019949 0.086711   0.21677749 0.30348849 0.\n",
      " 0.1300665  0.30348849 0.21677749 0.        ]\n",
      "\n",
      "[0, 1, 2, 3]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/8d39e2c8-cf58-4247-bbaa-891811fb9800-1.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "from math import sqrt\n",
    "\n",
    "init_list = [4, 8, 5, 9, 8, 6, 2, 9, 2, 5, 7, 0, 3, 7, 5]\n",
    "qubits = round(np.log(len(init_list)) / np.log(2) + 0.4999999999)\n",
    "need_length = 2**qubits\n",
    "pad = need_length - len(init_list)\n",
    "for i in range(0, pad):\n",
    "    init_list.append(0)\n",
    "\n",
    "init_array = np.array(init_list)  # Unnormalized data vector\n",
    "length = sqrt(\n",
    "    sum(init_array[i] ** 2 for i in range(0, len(init_array)))\n",
    ")  # Vector length\n",
    "norm_array = init_array / length  # Normalized array\n",
    "print(\"Normalized array:\")\n",
    "print(norm_array)\n",
    "print()\n",
    "\n",
    "qubit_numbers = []\n",
    "for i in range(0, qubits):\n",
    "    qubit_numbers.append(i)\n",
    "print(qubit_numbers)\n",
    "\n",
    "qc = QuantumCircuit(qubits)\n",
    "qc.initialize(norm_array, qubit_numbers)\n",
    "qc.decompose(reps=7).draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b999d74e-fd81-4c13-a6c7-a0fbc0835fc5",
   "metadata": {},
   "source": [
    "### Check-in question\n",
    "\n",
    "<details>\n",
    "<summary>\n",
    "Do you see advantages to amplitude encoding over basis encoding? If so, explain.\n",
    "</summary>\n",
    "Answer:\n",
    "\n",
    "There may be several answers. One answer is that, given the fixed ordering of the basis states, this amplitude encoding preserves the order of the numbers encoded. It will often also be encoded more densely.\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbedf6cc-33a8-4be7-90ec-81af4d224871",
   "metadata": {},
   "source": [
    "A benefit of amplitude encoding is that only $\\log_2(N)$ qubits are required for an $N$-dimensional ($N$-feature) data vector $\\vec{x}\\rightarrow|\\vec{x}\\rangle$. However, amplitude encoding is generally an inefficient procedure that requires arbitrary state preparation, which is exponential in the number of CNOT gates. Stated differently, the state preparation has a polynomial runtime complexity of $\\mathcal O(N)$ in the number of dimensions, where $N = 2^n$, and $n$ is the number of qubits. Amplitude encoding “provides an exponential saving in space at the cost of an exponential increase in time”[\\[3\\]](#references); however, runtime increases to $\\mathcal O(\\log N)$ are achievable in certain cases[\\[4\\]](#references). For an end-to-end quantum speedup, the data loading runtime complexity needs to be considered."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ef15d5f-4730-4ea8-95fd-75aff06487ff",
   "metadata": {
    "formulas": {
     "_big-o-times-n": {
      "meaning": "This represents the tensor product operation over N qubits.",
      "say": "big o-times"
     },
     "_big-o-times-n2": {
      "meaning": "This represents the tensor product operation over N/2 qubits.",
      "say": "big o-times"
     }
    }
   },
   "source": [
    "### Angle encoding\n",
    "\n",
    "Angle encoding is of interest in many QML models using Pauli feature maps such as quantum support vector machines (QSVMs) and variational quantum circuits (VQCs), among others. Angle encoding is closely related to phase encoding and dense angle encoding which are presented below. Here we will use \"angle encoding\" to refer to a rotation in $\\theta$, that is, a rotation away from the $z$ axis accomplished for example by an $R_X$ gate or an $R_Y$ gate[\\[1,3\\]](#references). Really, one can encode data in *any* rotation or combination of rotations. But $R_Y$ is common in the literature, so we emphasize it here.\n",
    "\n",
    "When applied to a single qubit, angle encoding imparts a Y-axis rotation proportional to the data value. Consider the encoding of a single ($k^\\text{th}$)feature from the $j^\\text{th}$ data vector in a dataset, $\\vec{x}^{(j)}_k$:\n",
    "\n",
    "$$\n",
    "|\\vec{x}^{(j)}_k\\rangle = R_Y(\\theta=\\vec{x}^{(j)}_k)|0\\rangle = \\textstyle\\cos\\left(\\frac{\\vec{x}^{(j)}_k}{2}\\right)|0\\rangle + \\sin\\left(\\frac{\\vec{x}^{(j)}_k}{2}\\right)|1\\rangle.\n",
    "$$\n",
    "\n",
    "Alternatively, angle encoding can be performed using $R_X(\\theta)$ gates, although the encoded state would have a complex relative phase compared to $R_Y(\\theta)$.\n",
    "\n",
    "Angle encoding is different from the previous two methods discussed in several ways. In angle encoding:\n",
    "- Each feature value is mapped to a corresponding qubit, $\\vec{x}^{(j)}_k \\rightarrow Q_k$, leaving the qubits in a product state.\n",
    "- One numerical value is encoded at a time, rather than a whole set of features from a data point.\n",
    "- $n$ qubits are required for $N$ data features, where $n\\leq N$. Often equality holds, here. We'll see how $n<N$ is possible in the next few sections.\n",
    "- The resulting circuit is a constant depth (typically the depth is 1 prior to transpilation).\n",
    "\n",
    "The constant depth quantum circuit makes it particularly amenable to current quantum hardware. One additional feature of encoding our data using $\\theta$ (and specifically, our choice to use Y-axis angle encoding) is that it creates real-valued quantum states that can be useful for certain applications. For Y-axis rotation, data is mapped with a Y-axis rotation gate $R_Y(\\theta)$ by a real-valued angle $\\theta \\in (0, 2\\pi]$ ([Qiskit RYGate](/docs/api/qiskit/qiskit.circuit.library.RYGate)). As with phase encoding (see below), we recommend that you rescale data so that $\\vec{x}^{(j)}_k \\in (0,2\\pi]$, preventing information loss and other unwanted effects.\n",
    "\n",
    "The following Qiskit code rotates a single qubit from an initial state $|0\\rangle$ to encode a data value $\\vec{x}^{(j)}_k=\\frac{1}{2}\\pi$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "5827fe1c-7e88-4248-950c-ae843582730c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from qiskit.quantum_info import Statevector\n",
    "from math import pi\n",
    "\n",
    "qc = QuantumCircuit(1)\n",
    "state1 = Statevector.from_instruction(qc)\n",
    "qc.ry(pi / 2, 0)  # Phase gate rotates by an angle pi/2\n",
    "state2 = Statevector.from_instruction(qc)\n",
    "states = state1, state2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e4bb568-de99-4ddf-ac42-6bd958fcda2f",
   "metadata": {},
   "source": [
    "We will define a function to visualize the action on the state vector. The details of the function definition are not important, but the ability to visualize the state vectors and their changes is important."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "666700f7-7798-43ce-a8ca-d91e48adda4f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/666700f7-7798-43ce-a8ca-d91e48adda4f-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from qiskit.visualization.bloch import Bloch\n",
    "from qiskit.visualization.state_visualization import _bloch_multivector_data\n",
    "\n",
    "\n",
    "def plot_Nstates(states, axis, plot_trace_points=True):\n",
    "    \"\"\"This function plots N states to 1 Bloch sphere\"\"\"\n",
    "    bloch_vecs = [_bloch_multivector_data(s)[0] for s in states]\n",
    "\n",
    "    if axis is None:\n",
    "        bloch_plot = Bloch()\n",
    "    else:\n",
    "        bloch_plot = Bloch(axes=axis)\n",
    "\n",
    "    bloch_plot.add_vectors(bloch_vecs)\n",
    "\n",
    "    if len(states) > 1:\n",
    "\n",
    "        def rgba_map(x, num):\n",
    "            g = (0.95 - 0.05) / (num - 1)\n",
    "            i = 0.95 - g * num\n",
    "            y = g * x + i\n",
    "            return (0.0, y, 0.0, 0.7)\n",
    "\n",
    "        num = len(states)\n",
    "        bloch_plot.vector_color = [rgba_map(x, num) for x in range(1, num + 1)]\n",
    "\n",
    "    bloch_plot.vector_width = 3\n",
    "    bloch_plot.vector_style = \"simple\"\n",
    "\n",
    "    if plot_trace_points:\n",
    "\n",
    "        def trace_points(bloch_vec1, bloch_vec2):\n",
    "            # bloch_vec = (x,y,z)\n",
    "            n_points = 15\n",
    "            thetas = np.arccos([bloch_vec1[2], bloch_vec2[2]])\n",
    "            phis = np.arctan2(\n",
    "                [bloch_vec1[1], bloch_vec2[1]], [bloch_vec1[0], bloch_vec2[0]]\n",
    "            )\n",
    "            if phis[1] < 0:\n",
    "                phis[1] = phis[1] + 2 * pi\n",
    "            angles0 = np.linspace(phis[0], phis[1], n_points)\n",
    "            angles1 = np.linspace(thetas[0], thetas[1], n_points)\n",
    "\n",
    "            xp = np.cos(angles0) * np.sin(angles1)\n",
    "            yp = np.sin(angles0) * np.sin(angles1)\n",
    "            zp = np.cos(angles1)\n",
    "            pnts = [xp, yp, zp]\n",
    "            bloch_plot.add_points(pnts)\n",
    "            bloch_plot.point_color = \"k\"\n",
    "            bloch_plot.point_size = [4] * len(bloch_plot.points)\n",
    "            bloch_plot.point_marker = [\"o\"]\n",
    "\n",
    "        for i in range(len(bloch_vecs) - 1):\n",
    "            trace_points(bloch_vecs[i], bloch_vecs[i + 1])\n",
    "\n",
    "    bloch_plot.sphere_alpha = 0.05\n",
    "    bloch_plot.frame_alpha = 0.15\n",
    "    bloch_plot.figsize = [4, 4]\n",
    "\n",
    "    bloch_plot.render()\n",
    "\n",
    "\n",
    "plot_Nstates(states, axis=None, plot_trace_points=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ba88e2e-4774-4a78-9971-b8b1df927229",
   "metadata": {},
   "source": [
    "That was just a single feature of a single data vector. When encoding $N$ features into the rotation angles of $n$ qubits, say for the $j^\\text{th}$ data vector $\\vec{x}^{(j)} = (x_1,...,x_N),$ the encoded product state will look like this:\n",
    "\n",
    "$$\n",
    "|\\vec{x}^{(j)}\\rangle = \\bigotimes^N_{k=1} \\cos(\\vec{x}^{(j)}_k)|0\\rangle + \\sin(\\vec{x}^{(j)}_k)|1\\rangle\n",
    "$$\n",
    "\n",
    "We note that this is equivalent to\n",
    "\n",
    "$$\n",
    "|\\vec{x}^{(j)}\\rangle = \\bigotimes^N_{k=1} R_Y(2\\vec{x}^{(j)}_k)|0\\rangle.\n",
    "$$\n",
    "\n",
    "### Example\n",
    "\n",
    "Encode the data vector $\\vec{x}^{(j)} = (0, \\pi/4, \\pi/2)$ using angle encoding, as described above.\n",
    "\n",
    "__Solution:__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "cb916512-c004-4000-a83a-8c81ab430e74",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/cb916512-c004-4000-a83a-8c81ab430e74-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qc = QuantumCircuit(3)\n",
    "qc.ry(0, 0)\n",
    "qc.ry(2 * math.pi / 4, 1)\n",
    "qc.ry(2 * math.pi / 2, 2)\n",
    "qc.draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4e84936-2d60-40fb-98f0-461d62d3f63f",
   "metadata": {},
   "source": [
    "### Check-in questions\n",
    "\n",
    "<details>\n",
    "<summary>\n",
    "Using angle encoding as described above, how many qubits are required to encode 5 features?\n",
    "</summary>\n",
    "Answer: 5\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64cd9b41-00f5-4212-b52e-939ca96f84c7",
   "metadata": {},
   "source": [
    "### Phase encoding\n",
    "\n",
    "Phase encoding is very similar to the angle encoding described above. The phase angle of a qubit is a real-valued angle $\\phi$ about the $z$-axis from the +$x$-axis. Data are mapped with a phase rotation, $P(\\phi) = e^{i\\phi/2}R_Z(\\phi)$, where $\\phi \\in (0,2\\pi]$ (see [Qiskit PhaseGate](/docs/api/qiskit/qiskit.circuit.library.PhaseGate) for more information). It is recommended to rescale data so that $\\vec{x}^{(j)}_k \\in (0,2\\pi]$. This prevents information loss and other potentially unwanted effects[\\[1,2\\]](#references).\n",
    "\n",
    "A qubit is often initialized in the state $|0\\rangle$, which is an eigenstate of the phase rotation operator, meaning that the qubit state first needs to be rotated for phase encoding to be implemented. It therefore makes sense to initialize the state with a Hadamard gate: $H|0\\rangle = |+\\rangle = \\textstyle\\frac{1}{\\sqrt{2}}(|0\\rangle + |1\\rangle)$. Phase encoding on a single qubit means imparting a relative phase proportional to the data value:\n",
    "\n",
    "$$\n",
    "\\begin{equation}|\\vec{x}^{(j)}_k\\rangle = P(\\phi=\\vec{x}^{(j)}_k)|+\\rangle = \\textstyle\\frac{1}{\\sqrt{2}}\\big(|0\\rangle + e^{i\\vec{x}^{(j)}_k}|1\\rangle\\big).\n",
    "\\end{equation}\n",
    "$$\n",
    "\n",
    "The phase encoding procedure maps each feature value to the phase of a corresponding qubit, $\\vec{x}^{(j)}_k \\rightarrow Q_k$. In total, phase encoding has a circuit depth of 2, including the Hadamard layer, which makes it an efficient encoding scheme. The phase-encoded multi-qubit state ($n$ qubits for $N=n$ features) is a product state:\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "|\\vec{x}^{(j)}\\rangle = \\bigotimes_{k=1}^{N} P_k(\\phi = \\vec{x}^{(j)}_k)|+\\rangle^{\\otimes N} = {\\textstyle\\frac{1}{\\sqrt{2^N}}} \\bigotimes_{k=1}^{N}\\big(|0\\rangle + e^{i\\vec{x}^{(j)}_k}|1\\rangle\\big).\n",
    "\\end{equation}\n",
    "$$\n",
    "\n",
    "The following Qiskit code first prepares the initial state of a single qubit by rotating it with a Hadamard gate, then rotates it again using a phase gate to encode a data feature $\\vec{x}^{(j)}_k=\\frac{1}{2}\\pi$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "ba0886eb-1c56-4b15-a731-d94d805254e1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/ba0886eb-1c56-4b15-a731-d94d805254e1-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qc = QuantumCircuit(1)\n",
    "qc.h(0)  # Hadamard gate rotates state down to Bloch equator\n",
    "state1 = Statevector.from_instruction(qc)\n",
    "\n",
    "qc.p(pi / 2, 0)  # Phase gate rotates by an angle pi/2\n",
    "state2 = Statevector.from_instruction(qc)\n",
    "\n",
    "states = state1, state2\n",
    "\n",
    "qc.draw(\"mpl\", scale=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02ee25db-7368-40d0-9ba8-52e231ede3e0",
   "metadata": {},
   "source": [
    "We can visualize the rotation in $\\phi$ using the plot_Nstates function we defined."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "f7c9cf29-2ad6-43af-a7e3-e590e41d7e67",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/f7c9cf29-2ad6-43af-a7e3-e590e41d7e67-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_Nstates(states, axis=None, plot_trace_points=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "803fcc6b-a6c2-463d-8e46-e83074faf025",
   "metadata": {},
   "source": [
    "The Bloch sphere plot shows the Z-axis rotation $|+\\rangle \\rightarrow P(\\frac{1}{2}\\pi)|+\\rangle$ where $\\vec{x}^{(j)}_k=\\frac{1}{2}\\pi$. The light green arrow shows the final state.\n",
    "\n",
    "Phase encoding is used in many quantum feature maps, particularly $Z$ and $ZZ$ feature maps, and general Pauli feature maps, among others."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9f2786b-9fff-4425-8387-0a60ce690689",
   "metadata": {},
   "source": [
    "### Check-in questions\n",
    "<details>\n",
    "<summary>\n",
    "How many qubits are required in order to use phase encoding as described above to store 8 features?\n",
    "</summary>\n",
    "Answer: 8\n",
    "</details>\n",
    "\n",
    "### Example\n",
    "\n",
    "Write code to the vector (4,8,5,9,8,6,2,9,2,5,7,0) using phase encoding.\n",
    "\n",
    "__Solution:__\n",
    "\n",
    "There may be many answers. Here is one example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "a95d5081-1bb5-47a6-93c1-0a57b0394781",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/a95d5081-1bb5-47a6-93c1-0a57b0394781-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "phase_data = [4, 8, 5, 9, 8, 6, 2, 9, 2, 5, 7, 0]\n",
    "qc = QuantumCircuit(len(phase_data))\n",
    "for i in range(0, len(phase_data)):\n",
    "    qc.h(i)\n",
    "    qc.rz(phase_data[i] * 2 * math.pi / float(max(phase_data)), i)\n",
    "qc.draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "910625ad-d798-4a78-835c-2b92a046dbb1",
   "metadata": {},
   "source": [
    "### Dense angle encoding\n",
    "\n",
    "Dense angle encoding (DAE) is a combination of angle encoding and phase encoding. DAE allows two feature values to be encoded in a single qubit: one angle with a Y-axis rotation angle, and the other with a $z$-axis rotation angle: $\\vec{x}^{(j)}_k,$ $\\vec{x}^{(j)}_\\ell \\rightarrow \\theta, \\phi$. It encodes two features as follows:\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "|\\vec{x}^{(j)}_k,\\vec{x}^{(j)}_\\ell\\rangle = R_Z(\\phi=\\vec{x}^{(j)}_\\ell) R_Y(\\theta=\\vec{x}^{(j)}_k)|0\\rangle = \\cos\\left(\\frac{\\vec{x}^{(j)}_k}{2}\\right)|0\\rangle + e^{i\\vec{x}^{(j)}_\\ell} \\sin\\left(\\frac{\\vec{x}^{(j)}_k}{2}\\right)|1\\rangle.\n",
    "\\end{equation}\n",
    "$$\n",
    "\n",
    "Encoding two data features to one qubit results in a $2\\times$ reduction in the number of qubits required for the encoding. Extending this to more features, the data vector $\\vec{x} = (x_1,...,x_N)$ can be encoded as:\n",
    "\n",
    "$$\n",
    "|\\vec{x}\\rangle = \\bigotimes_{k=1}^{N/2} \\cos(x_{2k-1})|0\\rangle + e^{i x_{2k}}\\sin(x_{2k-1})|1\\rangle\n",
    "$$\n",
    "\n",
    "DAE can be generalized to arbitrary functions of the two features instead of the sinusoidal functions used here. This is called general qubit encoding[\\[7\\]](#references).\n",
    "\n",
    "As an example of DAE, the code below encodes and visualizes the encoding of the features $x_1=\\theta = 3\\pi/8$ and $x_2=\\phi = 7\\pi/4$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "9a6bb041-d7a1-4e29-a463-81b93b900e96",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/9a6bb041-d7a1-4e29-a463-81b93b900e96-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "qc = QuantumCircuit(1)\n",
    "state1 = Statevector.from_instruction(qc)\n",
    "qc.ry(3 * pi / 8, 0)\n",
    "state2 = Statevector.from_instruction(qc)\n",
    "qc.rz(7 * pi / 4, 0)\n",
    "state3 = Statevector.from_instruction(qc)\n",
    "states = state1, state2, state3\n",
    "\n",
    "plot_Nstates(states, axis=None, plot_trace_points=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdc69e45-3651-453b-9e04-5e6f6c82f8b1",
   "metadata": {},
   "source": [
    "### Check-in questions\n",
    "\n",
    "<details>\n",
    "<summary>\n",
    "Given the treatment above, how many qubits are needed to encode 6 features using dense encoding?\n",
    "</summary>\n",
    "Answer: 3\n",
    "</details>\n",
    "\n",
    "\n",
    "### Example\n",
    "\n",
    "Write code to load the vector (4,8,5,9,8,6,2,9,2,5,7,0,3,7,5) using dense angle encoding.\n",
    "\n",
    "__Solution:__\n",
    "\n",
    "Note that we have padded the list with a \"0\" to avoid the problem of there being a single unused parameter in our encoding scheme."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "f34c9682-dc62-4a30-88ca-72fe28eff0a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/f34c9682-dc62-4a30-88ca-72fe28eff0a5-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dense_data = [4, 8, 5, 9, 8, 6, 2, 9, 2, 5, 7, 0, 3, 7, 5, 0]\n",
    "qc = QuantumCircuit(int(len(dense_data) / 2))\n",
    "entry = 0\n",
    "for i in range(0, int(len(dense_data) / 2)):\n",
    "    qc.ry(dense_data[entry] * 2 * math.pi / float(max(dense_data)), i)\n",
    "    entry = entry + 1\n",
    "    qc.rz(dense_data[entry] * 2 * math.pi / float(max(dense_data)), i)\n",
    "    entry = entry + 1\n",
    "qc.draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1035f985-3ab9-476d-8173-c4b0a764d46f",
   "metadata": {},
   "source": [
    "## Encoding with built-in feature maps\n",
    "\n",
    "### Encoding at arbitrary points\n",
    "\n",
    "Angle encoding, phase encoding, and dense encoding prepared product states with a feature encoded on each qubit (or two features per qubit). This is different from basis encoding and amplitude encoding, in that those methods make use of entangled states. There is not a 1:1 correspondence between data feature and qubit. In amplitude encoding, for example, you might have one feature as the amplitude of the state $|01\\rangle$ and another feature as the amplitude for $|10\\rangle$. Generally, methods that encode in product states yield shallower circuits and can store 1 or 2 features on each qubit. Methods that use entanglement and associate a feature with a state rather than a qubit result in deeper circuits, and can store more features per qubit on average.\n",
    "\n",
    "But encoding need not be entirely in product states or entirely in entangled states as in amplitude encoding. Indeed, many encoding schemes built into Qiskit allow encoding both before and after an entanglement layer, as opposed to just at the beginning. This is known as \"data reuploading\". For related work, see references [5] and  [6].\n",
    "\n",
    "In this section, we will use and visualize a few of the built-in encoding schemes. All the methods in this section encode $N$ features as rotations on $N$ parameterized gates on $n$ qubits, where $n \\leq N$. Note that maximizing data loading for a given number of qubits is not the only consideration. In many cases, circuit depth may be an even more important consideration than qubit count."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f95bc3b-bff1-469f-92ff-b2f6f523b978",
   "metadata": {},
   "source": [
    "### EfficientSU2\n",
    "\n",
    "A common and useful example of encoding with entanglement is Qiskit's [`EfficientSU2`](/docs/api/qiskit/qiskit.circuit.library.EfficientSU2) circuit. Impressively, this circuit can, for example, encode 8 features on only 2 qubits. Let's see this, and then try to understand how it is possible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "6b657226-ae95-41f6-b78b-5def930d0080",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/6b657226-ae95-41f6-b78b-5def930d0080-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from qiskit.circuit.library import EfficientSU2\n",
    "\n",
    "circuit = EfficientSU2(num_qubits=2, reps=1, insert_barriers=True)\n",
    "circuit.decompose().draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93e9e6ae-85e8-4077-9c25-f947daebe610",
   "metadata": {},
   "source": [
    "Up to the first barrier (a point we label $b1$), our states are:\n",
    "\n",
    "$$\n",
    "|\\psi\\rangle_{b1} = \\left(\\cos(\\theta_0)|0\\rangle+\\sin(\\theta_0)e^{i\\theta_2}|1\\rangle\\right)\\otimes\\left(\\cos(\\theta_1)|0\\rangle+\\sin(\\theta_1)e^{i\\theta_3}|1\\rangle\\right)\n",
    "$$\n",
    "\n",
    "That's just dense encoding, which we've seen before. Now after the CNOT gate, at the second barrier ($b2$), our state is\n",
    "\n",
    "$$\n",
    "|\\psi\\rangle_{b2} = \\left(\\cos(\\theta_0)|0\\rangle+\\sin(\\theta_0)e^{i\\theta_2}|1\\rangle\\right)\\otimes\\cos(\\theta_1)|0\\rangle+\n",
    "\\left(\\sin(\\theta_0)e^{i\\theta_2}|0\\rangle+\\cos(\\theta_0)|1\\rangle\\right)\\otimes\\sin(\\theta_1)e^{i\\theta_3}|1\\rangle\n",
    "$$\n",
    "We now apply the last set of rotations to obtain:\n",
    "$$\n",
    "\\begin{align}\n",
    "\\nonumber\n",
    "|\\psi\\rangle_{\\text{final}} &= \\left(\\cos(\\theta_0)|0\\rangle+\\sin(\\theta_0)e^{i\\theta_2}|1\\rangle\\right)\\otimes\\cos(\\theta_1)\\left(\\cos(\\theta_4)|0\\rangle+\\sin(\\theta_4)e^{i\\theta_6}|1\\rangle\\right)\\\\\\nonumber\n",
    "&+\\left(\\sin(\\theta_0)e^{i\\theta_2}|0\\rangle+\\cos(\\theta_0)|1\\rangle\\right)\\otimes\\sin(\\theta_1)e^{i\\theta_3}\\left(\\cos(\\theta_5)|1\\rangle+\\sin(\\theta_5)e^{i\\theta_7}|0\\rangle\\right)\\nonumber\n",
    "\\end{align}\n",
    "$$\n",
    "At first glance, it may appear that we have loaded so more parameters onto just a few states than makes sense, since the final state can be written as $\\psi_\\text{final} = c_0|00\\rangle+c_1|01\\rangle+c_2|10\\rangle+c_3|11\\rangle$. But note that each prefactor is complex! Written like this:\n",
    "$$\n",
    "\\psi_\\text{final} = (a_0+ib_0)|00\\rangle+(a_1+ib_1)|01\\rangle+(a_2+ib_2)|10\\rangle+(a_3+ib_3)|11\\rangle\n",
    "$$\n",
    "One can see that we do, indeed, have 8 parameters on the state on which to encode our 8 features.\n",
    "\n",
    "By increasing the number of qubits and increasing the number of repetitions of entangling and rotation layers, one can encode much more data. Writing out the wave functions quickly becomes intractable. But we can still see the encoding in action."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63ee5098-028a-4e40-a03d-c6f0e7c75605",
   "metadata": {},
   "source": [
    "Here we encode the data vector $\\vec{x} = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2]$ with 12 features, on a 3-qubit EfficientSU2 circuit, using each of the parameterized gates to encode a different feature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "73fc00fe-b98f-4d63-a327-54958a8f5498",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/73fc00fe-b98f-4d63-a327-54958a8f5498-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2]\n",
    "circuit = EfficientSU2(num_qubits=3, reps=1, insert_barriers=True)\n",
    "encode = circuit.assign_parameters(x)\n",
    "encode.decompose().draw(output=\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92a323d6-58e6-4e4d-b02f-9c63a0e78aaf",
   "metadata": {},
   "source": [
    "Instead of increasing the number of qubits, you might choose to increase the number of repetitions of entangling and rotation layers. But there are limits to how many repetitions are useful."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90a4abfc-7d1e-4495-b36d-6f7edeed8e96",
   "metadata": {},
   "source": [
    "As previously stated, there is a tradeoff: circuits with more qubits or more repetitions of entangling and rotation layers may store more parameters, but do so with greater circuit depth. We will return to the depths of some built-in feature maps, below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95126be4-3e8e-4ee4-9420-1b362d88bef8",
   "metadata": {},
   "source": [
    "The next few encoding methods that are built into Qiskit have \"feature map\" as part of their names. Let us reiterate that encoding data into a quantum circuit *is* a feature mapping, in the sense that it takes data into a new space: the Hilbert space of the qubits involved. The relationship between the dimensionality of the original feature space and that of the Hilbert space will depend on the circuit you use for encoding."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3fb23b0-ee8c-49ba-99b7-f256fdbba9d5",
   "metadata": {},
   "source": [
    "### $Z$ feature map\n",
    "\n",
    "The $Z$ feature map (ZFM) can be interpreted as a natural extension of phase encoding. The ZFM consists of alternating layers of single-qubit gates: Hadamard gate layers and phase gate layers. Let the data vector $\\vec{x}$ have $N$ features. The quantum circuit that performs the feature mapping is represented as a unitary operator that acts on the initial state:\n",
    "\n",
    "$$\n",
    "\\mathscr{U}_{\\text{ZFM}}(\\vec{x})|0\\rangle^{\\otimes N}=|\\phi(\\vec{x})\\rangle\n",
    "$$\n",
    "where $|0\\rangle^{\\otimes N}$ is the $N$-qubit ground state. This notation is used for consistency with reference [\\[4\\]](#references) Havlicek et al. The data features $x_i$ are mapped one-to-one with corresponding qubits. For example, if you have 8 features in a data vector, then you would use 8 qubits. The ZFM circuit is composed of $r$ repetitions of a subcircuit comprised of Hadamard gate layers and phase gate layers. A Hadamard layer is made up of a Hadamard gate acting on every qubit in an $n$-qubit register, $H \\otimes H \\otimes \\dots \\otimes H = H^{\\otimes n}$, within the same stage of the algorithm. This description also applies to a phase gate layer in which the $i^\\text{th}$ qubit is acted on by $P(\\vec{x}_i)$. Each $P$ gate has one feature as an argument, but the phase gate layer ($P(\\vec{x}_1)\\otimes\\ldots P(\\vec{x}_k)\\otimes\\ldots P(\\vec{x}_N)$ is a function of the data vector. The full ZFM circuit unitary with a single repetition is:\n",
    "$$\n",
    "\\mathscr{U}_{\\text{ZFM}}=\\big(P(\\vec{x}_1)\\otimes\\ldots P(\\vec{x}_k)\\otimes\\ldots P(\\vec{x}_N)H^{\\otimes N}\\big)=\\left(\\bigotimes_{k = 1}^N P(\\vec{x}_k)\\right)H^{\\otimes N}\n",
    "$$\n",
    "Then $r$ repetitions of this unitary would be\n",
    "$$\n",
    "\\mathscr{U}^{(r)}_{\\text{ZFM}}\\left(\\vec{x}\\right)=\\prod_{s=1}^{r}\\left[\\left(\\bigotimes_{k = 1}^N P(\\vec{x}_k)\\right)H^{\\otimes N}\\right]\n",
    "$$\n",
    "The data features, $x_k$, are mapped to the phase gates in the same way in all $r$ repetitions. The ZFM feature map state is a product state and is efficient for classical simulation[\\[4\\]](#references).\n",
    "\n",
    "To start with a small example, a 2-qubit ZFM circuit is coded using Qiskit and drawn to display the simple circuit structure. In the example, a single repetition, $r=1$, is implemented with the data vector $\\vec{x} = \\left(\\textstyle\\frac{1}{2}\\pi, \\textstyle\\frac{1}{3}\\pi\\right)$. The ZFM circuit unitary operator acts on the initial state in the following way:\n",
    "\n",
    "$$\n",
    "\\mathscr{U}_{\\text{ZFM}}(\\bar{x})|00\\rangle = P(\\bar{x})^{\\otimes 2} H^{\\otimes 2}|00\\rangle = \\left( P\\left(\\textstyle\\frac{1}{2}\\pi\\right)H|0\\rangle \\right) \\otimes \\left(P\\left(\\textstyle\\frac{1}{3}\\pi\\right)H|0\\rangle\\right).\n",
    "$$\n",
    "\n",
    "The formula has been rearranged around the tensor product to emphasize the operations on each qubit. The following Qiskit code uses Hadamard and phase gates explicitly to show the structure of the ZFM:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "f5c70df4-faea-4817-a870-95638eb97dbd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/f5c70df4-faea-4817-a870-95638eb97dbd-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qc0 = QuantumCircuit(1)\n",
    "qc1 = QuantumCircuit(1)\n",
    "\n",
    "qc0.h(0)\n",
    "qc0.p(pi / 2, 0)\n",
    "\n",
    "qc1.h(0)\n",
    "qc1.p(pi / 3, 0)\n",
    "\n",
    "# Combine circuits qc0 and qc1 into 1 circuit\n",
    "qc = QuantumCircuit(2)\n",
    "qc.compose(qc0, [0], inplace=True)\n",
    "qc.compose(qc1, [1], inplace=True)\n",
    "\n",
    "qc.draw(\"mpl\", scale=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1be7f51-3a4e-4d2e-8d51-d3ac62f4fd88",
   "metadata": {},
   "source": [
    "We now encode the same data vector $\\vec{x} = \\left(\\textstyle\\frac{1}{2}\\pi, \\textstyle\\frac{1}{3}\\pi\\right)$ to a ZFM circuit with three repetitions, $r=3$, using the Qiskit [`ZFeatureMap`](/docs/api/qiskit/qiskit.circuit.library.ZFeatureMap) class, which altogether gives us the quantum feature map $\\mathscr{U}_{\\text{ZFM}}(\\vec{x})$. By default in the `ZFeatureMap` class, parameters $\\beta$ are multiplied by 2 before mapping to the phase gate $\\beta \\rightarrow P(\\theta = 2\\beta)$. To reproduce the same encodings as above, we divide by 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "c15b8fe2-ae83-4c76-a908-71596deb7d82",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/c15b8fe2-ae83-4c76-a908-71596deb7d82-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from qiskit.circuit.library import ZFeatureMap\n",
    "\n",
    "zfeature_map = ZFeatureMap(feature_dimension=2, reps=3)\n",
    "zfeature_map = zfeature_map.assign_parameters([(1 / 2) * pi / 2, (1 / 2) * pi / 3])\n",
    "zfeature_map.decompose().draw(\"mpl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6551f7cd-8e7e-4313-9255-7c1a3928f625",
   "metadata": {},
   "source": [
    "You may use ZFM via Qiskit's ZFM class; you can also use this structure as inspiration to construct your own feature mapping."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08bbc7eb-b48c-44fa-be70-852b27b73343",
   "metadata": {},
   "source": [
    "### $ZZ$ feature map\n",
    "\n",
    "The $ZZ$ feature map (ZZFM) extends the ZFM with the inclusion of two-qubit entangling gates, specifically the $ZZ$-rotation gate $R_{ZZ}(\\theta)$. The ZZFM is conjectured to be generally expensive to compute on a classical computer, unlike the ZFM.\n",
    "\n",
    "$R_{ZZ}(\\theta)$ implements a $ZZ$-interaction and is maximally entangling for $\\theta = \\textstyle{\\frac{1}{2}}\\pi$. $R_{ZZ}(\\theta)$ can be decomposed into a series of gates on two qubits, as shown in the following Qiskit code using the [RZZ gate](/docs/api/qiskit/qiskit.circuit.library.RZZGate) and the `QuantumCircuit` class method ```decompose```. We encode a single feature of the data vector $\\vec{x}$: $\\vec{x}_k=\\pi.$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "6c312a5f-91a5-499c-a391-efc73cd0e4e1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/6c312a5f-91a5-499c-a391-efc73cd0e4e1-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qc = QuantumCircuit(2)\n",
    "qc.rzz(pi, 0, 1)\n",
    "qc.draw(\"mpl\", scale=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0795b997-4193-44cc-8310-4b8f5490aff0",
   "metadata": {},
   "source": [
    "As is often the case, we see this represented as a single gate-like unit, until we use .decompose() to see all constituent gates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "92062646-78f2-4dd6-82ad-7a543cb6a566",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/92062646-78f2-4dd6-82ad-7a543cb6a566-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qc.decompose().draw(\"mpl\", scale=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c8a07ad-be40-4c18-b9e2-905a46fc7f4b",
   "metadata": {},
   "source": [
    "Data is mapped with a phase rotation $P(\\theta) = e^{i\\theta/2}R_Z(\\theta)$ on the second qubit. The $R_{ZZ}(\\theta)$ gate entangles the two qubits on which it operates by a degree of entanglement determined by the encoded feature value.\n",
    "\n",
    "The full ZZFM circuit consists of a Hadamard gate and phase gate, as in the ZFM, followed by the entanglement described above. A single repetition of the ZZFM circuit is:\n",
    "\n",
    "$$\n",
    "\\mathscr{U}_{\\text{ZZFM}}(\\vec{x}) = U_{ZZ}(\\vec{x})\\big(P(\\vec{x}_1)\\otimes\\ldots P(\\vec{x}_k)\\otimes\\ldots P(\\vec{x}_N)H^{\\otimes N}\\big)=U_{ZZ}(\\vec{x})\\left(\\bigotimes_{k = 1}^N P(\\vec{x}_k)\\right)H^{\\otimes N},\n",
    "$$\n",
    "\n",
    "where $U_{ZZ}(\\vec{x})$ contains ZZ-gate layer structured by an entanglement scheme. Several entanglement schemes are shown in code blocks below. The structure of $U_{ZZ}(\\vec{x})$ also includes a function that combines the data features from qubits being entangled in the following way. Let us say that the $R_{ZZ}$ gate is to be applied to qubits $p$ and $q$. In the phase layer, these qubits have phase gates that encode $\\vec{x}_p$ and $\\vec{x}_q$ on them, respectively. The argument $\\theta_{q,p}$ of the $R_{ZZ,q,p}(\\theta_{q,p})$ will not simply be one of these features or the other, but a function often denoted by $\\phi$ (not to be confused with the azimuthal angle):\n",
    "$$\n",
    "\\theta_{q,p} \\rightarrow \\phi(\\vec{x}_q, \\vec{x}_p) = 2(\\pi-\\vec{x}_q)(\\pi-\\vec{x}_p).\n",
    "$$\n",
    "We will see this in several examples below. The extension to multiple repetitions is the same as in the ZFeatureMap case:\n",
    "$$\n",
    "\\mathscr{U}^{(r)}_{\\text{ZZFM}}\\left(\\vec{x}\\right)=\\prod_{s=1}^{r}\\left[U_{ZZ}(\\vec{x})\\left(\\bigotimes_{k = 1}^N P(\\vec{x}_k)\\right)H^{\\otimes N}\\right].\n",
    "$$\n",
    "As the operators have increased in complexity, let us first encode a data vector $\\vec{x} = (x_0, x_1)$ with a two-qubit ZZFM and one repetition using the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "1ec2f2e1-b665-4dde-a223-35489f17c695",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/1ec2f2e1-b665-4dde-a223-35489f17c695-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from qiskit.circuit.library import ZZFeatureMap\n",
    "\n",
    "feature_dim = 2\n",
    "zzfeature_map = ZZFeatureMap(\n",
    "    feature_dimension=feature_dim, entanglement=\"linear\", reps=1\n",
    ")\n",
    "zzfeature_map.decompose(reps=1).draw(\"mpl\", scale=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fa647fb-587d-4343-8fcb-e1bb34fea584",
   "metadata": {},
   "source": [
    "By default in Qiskit, the features $(\\vec{x}_1, \\vec{x}_2)$ are mapped together to $R_{ZZ}(\\theta)$ by this mapping function $\\theta_{1,2} = \\phi(\\vec{x}_1, \\vec{x}_2) = 2(\\pi-\\vec{x}_1)(\\pi-\\vec{x}_2)$. Qiskit allows the user to customize the function $\\phi$ (or $\\phi_S$ where $S$ is the set of qubit pairs coupled through $R_{ZZ}$ gates) as a preprocessing step.\n",
    "\n",
    "Moving to a four-dimensional data vector $\\vec{x} = (\\vec{x}_1, \\vec{x}_2, \\vec{x}_3, \\vec{x}_4)$ and mapping to a four-qubit ZZFM with one repetition, we can start to see the mapping $\\phi$ for various qubit pairs. We can also see the meaning of \"linear\" entanglement:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "979c765a-e3d8-4e3e-8e52-3f816203a934",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/979c765a-e3d8-4e3e-8e52-3f816203a934-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feature_dim = 4\n",
    "zzfeature_map = ZZFeatureMap(\n",
    "    feature_dimension=feature_dim, entanglement=\"linear\", reps=1\n",
    ")\n",
    "zzfeature_map.decompose().draw(\"mpl\", scale=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77d42bee-a283-4d19-85a6-bd9c8deedae4",
   "metadata": {},
   "source": [
    "In the linear entanglement scheme, nearest-neighbor (numbered) pairs of qubits in this circuit are entangled. There are other built-in entanglement schemes in Qiskit, including `circular` and `full`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eec9d682-f698-47a1-bc2f-86174c2efb0f",
   "metadata": {},
   "source": [
    "### Pauli feature map\n",
    "\n",
    "The Pauli feature map (PFM) is the generalization of the ZFM and ZZFM to use arbitrary Pauli gates. The Pauli feature map takes a very similar form to the previous two feature maps. For $r$ repetitions of the encoding of the $N$ features of vector $\\vec{x},$\n",
    "\n",
    "$$\n",
    "\\mathscr{U}_{\\text{PFM}}(\\vec{x}) = \\prod_{s=1}^{r} U(\\vec{x}) H^{\\otimes n}.\n",
    "$$\n",
    "\n",
    "For PFM, $U(\\vec{x})$ is generalized to a Pauli expansion unitary operator. Here we present a more generalized form of the feature maps considered so far:\n",
    "\n",
    "$$\n",
    "U(\\vec{x}) = \\exp\\left(i \\sum_{S \\in\\mathcal{I}} \\phi_S(\\vec{x}) \\prod_{i \\in S} \\sigma_i \\right),\n",
    "$$\n",
    "\n",
    "where $\\sigma_i$ is a Pauli operator, $\\sigma_i \\in {I,X,Y,Z}$. Here $\\mathcal{I}$ is the set of all qubit connectivities as determined by the feature map, including the set of qubits acted on by single-qubit gates. That is, for a feature map in which qubit 0 was acted upon by a phase gate, and qubits 2 and 3 were acted upon by an $R_{ZZ}$ gate, the set $\\mathcal{I}$ would include $\\{\\{0\\},\\{2,3\\}\\}$. $S$ runs through all elements of that set. In previous feature maps, the function $\\phi_S(\\vec{x})$ was involved either exclusively with single-qubit gates or exclusively with two-qubit gates. Here, we define it in general:\n",
    "$$\n",
    "\\phi_S(\\vec{x})=\n",
    "    \\begin{cases}\n",
    "      x_i & \\text{if } S= \\{i\\} \\text{ (single-qubit)}\\\\\n",
    "      \\prod_{j\\in{S}}(\\pi-x_j) & \\text{if } |S|\\ge2 \\text{ (multi-qubit)}\\\\\n",
    "    \\end{cases}\n",
    "$$\n",
    "\n",
    "For documentation, see the [Qiskit `PauliFeatureMap` class documentation](/docs/api/qiskit/qiskit.circuit.library.PauliFeatureMap)). In the ZZFM, the operator $\\sigma_i$ is restricted to $Z_i$.\n",
    "\n",
    "One way to understand the above unitary is through analogy with the propagator in a physical system. The unitary above is a unitary evolution operator, $\\exp(it\\mathcal{H})$, for a Hamiltonian, $\\mathcal{H}$, similar to the Ising model, where the time parameter, $t$, is replaced with data values to drive the evolution. The expansion of this unitary operator gives the PFM circuit. The entangling connectivities in $S$ can be interpreted as Ising couplings in a spin lattice."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29290b2a-fe6d-4c0c-805a-ae37bb2e48a9",
   "metadata": {},
   "source": [
    "Let us consider an example of Pauli $Y$ and $XX$ operators representing those Ising-type interactions. Qiskit provides a `PauliFeatureMap` class for instantiating a PFM with a choice of single- and $n$-qubit gates, which in this example will be passed as Pauli strings `‘Y’` and `‘XX’`. Typically, $n$ is 1 or 2 for single- and two-qubit interactions, respectively. The entanglement scheme is “linear,” meaning that only nearest-neighbor qubits in the quantum circuit are coupled. Note that this does not correspond to nearest-neighbor qubits on the quantum computer itself, as this quantum circuit is an abstraction layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "5ba5df82-83c1-428c-a269-baea5f75c3dc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/5ba5df82-83c1-428c-a269-baea5f75c3dc-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from qiskit.circuit.library import PauliFeatureMap\n",
    "\n",
    "feature_dim = 3\n",
    "pauli_feature_map = PauliFeatureMap(\n",
    "    feature_dimension=feature_dim, entanglement=\"linear\", reps=1, paulis=[\"Y\", \"XX\"]\n",
    ")\n",
    "# pauli_feature_map.decompose().draw('mpl', scale=1.5)\n",
    "pauli_feature_map.decompose().draw(\"mpl\", scale=1.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1adef80-5504-4cf2-8d78-3a129637f67c",
   "metadata": {},
   "source": [
    "Qiskit provides a parameter, $\\alpha$, in Pauli feature maps to control the scaling of Pauli rotations.\n",
    "\n",
    "$$\n",
    "U(\\bar{x}) = \\exp\\left(i \\alpha \\sum_{S\\subseteq[n]} \\phi_S(\\bar{x}) \\prod_{i \\in S} \\sigma_i \\right)\n",
    "$$\n",
    "\n",
    "The default value of $\\alpha$ is $2$. By optimizing its value in the interval, for example, $[0,4],$ one can better align a quantum kernel to the data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c457b46a-9982-45eb-b2df-12fc7251d91e",
   "metadata": {},
   "source": [
    "### Gallery of Pauli feature maps\n",
    "\n",
    "Here we visualize various Pauli feature maps for two-qubit circuits to get a better picture of the range of possibilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "69225757-193a-490b-8ca4-d1b187b774b3",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Underful Vlist: Vlist<w=4.34 h=4.28 d=0.00 s=0.00>[Hrule, Glue, Hlist<w=4.34 h=4.00 d=0.00 s=0.00>[Hbox, Hlist<w=3.17 h=4.00 d=0.00 s=0.00>[`X`, k0.17], Hbox]]\n",
      "Underful Vlist: Vlist<w=4.34 h=4.28 d=0.00 s=0.00>[Hrule, Glue, Hlist<w=4.34 h=4.00 d=0.00 s=0.00>[Hbox, Hlist<w=3.17 h=4.00 d=0.00 s=0.00>[`X`, k0.17], Hbox]]\n",
      "Underful Vlist: Vlist<w=4.34 h=4.28 d=0.00 s=0.00>[Hrule, Glue, Hlist<w=4.34 h=4.00 d=0.00 s=0.00>[Hbox, Hlist<w=3.17 h=4.00 d=0.00 s=0.00>[`X`, k0.17], Hbox]]\n",
      "Underful Vlist: Vlist<w=4.34 h=4.28 d=0.00 s=0.00>[Hrule, Glue, Hlist<w=4.34 h=4.00 d=0.00 s=0.00>[Hbox, Hlist<w=3.17 h=4.00 d=0.00 s=0.00>[`X`, k0.17], Hbox]]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/69225757-193a-490b-8ca4-d1b187b774b3-1.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from qiskit.visualization import circuit_drawer\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "feature_dim = 2\n",
    "fig, axs = plt.subplots(9, 2)\n",
    "i_plot = 0\n",
    "for paulis in [\n",
    "    [\"I\"],\n",
    "    [\"X\"],\n",
    "    [\"Y\"],\n",
    "    [\"Z\"],\n",
    "    [\"XX\"],\n",
    "    [\"XY\"],\n",
    "    [\"XZ\"],\n",
    "    [\"YY\"],\n",
    "    [\"YZ\"],\n",
    "    [\"ZZ\"],\n",
    "    [\"X\", \"ZZ\"],\n",
    "    [\"Y\", \"ZZ\"],\n",
    "    [\"Z\", \"ZZ\"],\n",
    "    [\"X\", \"YZ\"],\n",
    "    [\"Y\", \"YZ\"],\n",
    "    [\"Z\", \"YZ\"],\n",
    "    [\"YY\", \"ZZ\"],\n",
    "    [\"XY\", \"ZZ\"],\n",
    "]:\n",
    "    pfm = PauliFeatureMap(feature_dimension=feature_dim, paulis=paulis, reps=1)\n",
    "    circuit_drawer(\n",
    "        pfm.decompose(),\n",
    "        output=\"mpl\",\n",
    "        style={\"backgroundcolor\": \"#EEEEEE\"},\n",
    "        ax=axs[int((i_plot - i_plot % 2) / 2), i_plot % 2],\n",
    "    )\n",
    "    axs[int((i_plot - i_plot % 2) / 2), i_plot % 2].title.set_text(paulis)\n",
    "    i_plot += 1\n",
    "\n",
    "fig.set_figheight(16)\n",
    "fig.set_figwidth(16)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61e585cc-8e84-496e-9e51-21a818731be8",
   "metadata": {},
   "source": [
    "The above can, of course, be extended to include other permutations and repetitions of Pauli matrices. Learners are encouraged to experiment with those options."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44a27c71-acb8-4b4f-a344-836cae1e3896",
   "metadata": {},
   "source": [
    "## Review of built-in feature maps\n",
    "\n",
    "You have seen several schemes for encoding data into a quantum circuit:\n",
    "- Basis encoding\n",
    "- Amplitude encoding\n",
    "- Angle encoding\n",
    "- Phase encoding\n",
    "- Dense encoding\n",
    "\n",
    "You have seen how to construct your own feature maps using these encoding schemes, and you have seen four built-in feature maps which take advantage of angle and phase encoding:\n",
    "- EfficientSU2\n",
    "- ZFeatureMap\n",
    "- ZZFeatureMap\n",
    "- PauliFeatureMap\n",
    "\n",
    "These built-in feature maps differed from each other in several ways:\n",
    "- The depth for a given number of encoded features\n",
    "- The number of qubits required for a given number of features\n",
    "- The degree of entanglement (obviously related to the other differences)\n",
    "\n",
    "The code below applies these four built-in feature maps to the encoding of a feature set, and plots the two-qubit depth of the resulting circuit. Since two-qubit error rates are much higher than single-qubit gate error rates, one might reasonably be most interested in the depth of two-qubit gates. In the code below, we obtain counts of all gates in a circuit by first decomposing the circuit and then using count_ops(), as shown below. Here the two-qubit gates we are interested in are 'cx' gates:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "7f428edb-ea96-48bb-adb2-aed91535dbeb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Image src=\"/learning/images/courses/quantum-machine-learning/data-encoding/extracted-outputs/7f428edb-ea96-48bb-adb2-aed91535dbeb-0.avif\" alt=\"Output of the previous code cell\" />"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Initializing parameters and empty lists for depths\n",
    "x = [0.1, 0.2]\n",
    "n_data = []\n",
    "zz2gates = []\n",
    "su22gates = []\n",
    "z2gates = []\n",
    "p2gates = []\n",
    "\n",
    "# Generating feature maps\n",
    "for n in range(3, 10):\n",
    "    x.append(n / 10)\n",
    "    zzcircuit = ZZFeatureMap(n, reps=1, insert_barriers=True)\n",
    "    zcircuit = ZFeatureMap(n, reps=1, insert_barriers=True)\n",
    "    su2circuit = EfficientSU2(n, reps=1, insert_barriers=True)\n",
    "    pcircuit = PauliFeatureMap(n, reps=1, paulis=[\"XX\"], insert_barriers=True)\n",
    "    # Getting the cx depths\n",
    "    zzcx = zzcircuit.decompose().count_ops().get(\"cx\")\n",
    "    zcx = zcircuit.decompose().count_ops().get(\"cx\")\n",
    "    su2cx = su2circuit.decompose().count_ops().get(\"cx\")\n",
    "    pcx = pcircuit.decompose().count_ops().get(\"cx\")\n",
    "\n",
    "    # Appending the cx gate counts to the lists. We shift the zz & pauli data points, because they overlap.\n",
    "    n_data.append(n)\n",
    "    zz2gates.append(zzcx - 0.5)\n",
    "    z2gates.append(0)\n",
    "    su22gates.append(su2cx)\n",
    "    p2gates.append(pcx + 0.5)\n",
    "\n",
    "# Plot the output\n",
    "plt.plot(n_data, p2gates, \"bo\")\n",
    "plt.plot(n_data, zz2gates, \"ro\")\n",
    "plt.plot(n_data, su22gates, \"yo\")\n",
    "plt.plot(n_data, z2gates, \"go\")\n",
    "plt.ylabel(\"CX Gates\")\n",
    "plt.xlabel(\"Data elements\")\n",
    "plt.legend([\"Pauli\", \"ZZ\", \"SU2\", \"Z\"])\n",
    "# plt.suptitle('ZZFeatureMap(n)')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "777bf066-af13-4065-a795-af19f67da099",
   "metadata": {},
   "source": [
    "Generally Pauli and ZZ feature maps will result in greater circuit depth and higher numbers of 2-qubit gates than EfficientSU2 and Z feature maps.\n",
    "\n",
    "Because the feature maps built into Qiskit are widely applicable, we will often not need to design our own, especially in the learning phase. However, experts in quantum machine learning will likely return to the subject of designing their own feature mapping, as they tackle two complicated challenges:\n",
    "\n",
    "1. Modern hardware: the presence of noise and the large overhead of error-correcting code mean that present-day applications will need to consider things like hardware efficiency and minimizing two-qubit gate depth.\n",
    "\n",
    "2. Mappings that fit the problem at hand: It is one thing to say that the ZZFeatureMap, for example, is difficult to simulate classically, and therefore interesting. It is quite another thing for the ZZFeatureMap to be ideally suited to __your__ machine learning task or data set. The performance of different parameterized quantum circuits on different types of data is an active area of investigation.\n",
    "\n",
    "We close with a note on hardware efficiency."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b795f8d-1f9f-4e43-82df-f2725ff709e9",
   "metadata": {},
   "source": [
    "## Hardware-efficient feature mapping\n",
    "\n",
    "A hardware-efficient feature mapping is one that takes into account constraints of real quantum computers, in the interest of reducing noise and errors in the computation. When running quantum circuits on near-term quantum computers, there are many strategies to mitigate noise inherent to the hardware. One main strategy for hardware efficiency is the minimization of the depth of the quantum circuit so that noise and decoherence have less time to corrupt the computation. The depth of a quantum circuit is the number of time-aligned gate steps required to complete the entire computation (after circuit optimization)[\\[5\\]](#references). Recall that the depth of the abstract, logical circuit may be much lower than the depth once the circuit is transpiled for a real quantum computer.\n",
    "\n",
    "Transpilation is the process of converting the quantum circuit from a high-level abstraction to one that is ready to run on a real quantum computer, taking into account constraints of the hardware. A quantum computer has a native set of single- and two-qubit gates. This means all gates in Qiskit code have to be transpiled into the set of native hardware gates. For example, in ibm_torino, a system sporting a heron r1 processor and completed in 2023, the native or basis gates are `{CZ, ID, RZ, SX, X}`. These are the two-qubit controlled-Z gate, and single-qubit gates called identity, $Z$-rotation, square root of NOT, and NOT, respectively, providing a universal set. When implementing multi-qubit gates as an equivalent subcircuit, physical two-qubit $CZ$ gates are required, along with other single-qubit gates available in hardware. In addition, to perform a two-qubit gate on a pair of qubits that are not physically coupled, SWAP gates are added to move qubit states between qubits to enable coupling, which leads to an unavoidable extension of the circuit. Using the ```optimization``` argument that can be set from 0 up to a highest level of 3. For greater control and customizability, the transpiler pipeline can be managed with the [Qiskit Pass Manager](/docs/api/qiskit/qiskit.transpiler.PassManager). Refer to the [Qiskit Transpiler documentation](/docs/api/qiskit/transpiler) for more information on transpilation.\n",
    "\n",
    "In Havlicek et al. 2019 [\\[2\\]](#references), one way the authors achieve hardware efficiency is by using the $ZZ$ feature map because it is a second-order expansion (see the “$ZZ$ feature map” section above). An $N$-order expansion has $N$-qubit gates. IBM Quantum systems do not have native $N$-qubit gates, where $N>2$, so to implement them would require decomposition into two-qubit CNOT gates available in hardware. A second way the authors minimize depth is by choosing a $ZZ$ coupling topology that maps directly to the architecture couplings. A further optimization they undertake is targeting a higher-performing, suitably connected hardware subcircuit. Additional things to consider are minimizing the number of feature map repetitions and choosing a customized low-depth or “linear” entangling scheme instead of the “full” scheme that entangles all qubits.\n",
    "\n",
    "![Data encoding image](/learning/images/courses/quantum-machine-learning/data-encoding/qml-03-data-encoding-24.avif)\n",
    "\n",
    "The above graphic shows a network of nodes and edges that represent physical qubits and hardware couplings, respectively. The coupling map and performance of ibm_torino is shown with all possible two-qubit CZ coupling gates. Qubits are color-coded on a scale based on the T1 relaxation time in microseconds (μs), where longer T1 times are better and in a lighter shade. The coupling edges are color-coded by CZ error, where darker shades are better. Information on the hardware specification can be accessed in the hardware backend configuration schema ```IBMQBackend.configuration()```."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98db3e11-2e00-406c-a4ee-88dc3ee82b96",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "1.  Maria Schuld and Francesco Petruccione, *Supervised Learning with Quantum Computers*, Springer 2018, [doi:10.1007/978-3-319-96424-9](https://www.springer.com/gp/book/9783319964232).\n",
    "2. <a id='Havlicek2018'></a>Vojtech Havlicek et al., “Supervised Learning with Quantum Enhanced Feature Spaces.” *Nature*, vol. 567 (2019): 209–212. https://arxiv.org/abs/1804.11326.\n",
    "3. Ryan LaRose and Brian Coyle, \"Robust data encodings for quantum classifiers\", Physical Review A 102, 032420 (2020), [doi:10.1103/PhysRevA.102.032420](https://journals.aps.org/pra/abstract/10.1103/PhysRevA.102.032420), [arXiv:2003.01695](https://arxiv.org/abs/2003.01695).\n",
    "4. <a id='Grover2002'></a>Lou Grover and Terry Rudolph. “Creating Superpositions That Correspond to Efficiently Integrable Probability Distributions.” arXiv:quant-ph/0208112, August 15, 2002, https://arxiv.org/abs/quant-ph/0208112.\n",
    "5. Adrián Pérez-Salinas, Alba Cervera-Lierta, Elies Gil-Fuster, José I. Latorre, \"Data re-uploading for a universal quantum classifier\",  [Quantum 4, 226 (2020)](https://quantum-journal.org/papers/q-2020-02-06-226/), [ArXiv.org/abs/1907.02085](https://arxiv.org/abs/1907.02085).\n",
    "6. Maria Schuld, Ryan Sweke, Johannes Jakob Meyer, \"The effect of data encoding on the expressive power of variational quantum machine learning models\", [Phys. Rev. A 103, 032430 (2021)](https://journals.aps.org/pra/abstract/10.1103/PhysRevA.103.032430), [arxiv.org/abs/2008.08605](https://arxiv.org/abs/2008.08605)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "571dc6bf-7c16-4623-be61-64a5b215bd69",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'1.2.2'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import qiskit\n",
    "\n",
    "qiskit.version.get_version_info()"
   ]
  }
 ],
 "metadata": {
  "description": "An overview of data encoding methods in quantum machine learning. Encoding schemes covered include basis, amplitude, angle, and dense encoding.",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3"
  },
  "title": "Data encoding"
 },
 "nbformat": 4,
 "nbformat_minor": 4
}