Let us start by assuming that we have N datapoints, each being described by m features. The goal is to classify these datapoints into k clusters. Without loss of generality, datapoints are described by m-dimensional vectors (overrightarrow {x}_i), with (i = 1, 2, ldots , N). To implement a clustering of the data we could, for instance, use classical bit variables (q_i^a = 0,1), with (i = 1, 2, ldots , N) and (a = 1, 2, ldots , k), so that (q_i^a = 0) if datapoint i is not in cluster a, and (q_i^a = 1) if it is in the cluster. Let us also call (d(overrightarrow {x}_i, overrightarrow {x}_j)) some distance measure between datapoints (overrightarrow {x}_i) and (overrightarrow {x}_j). With this notation we build a classical cost function H such that points very far away tend to fall into different clusters4:
$$ H = frac{1}{2}sumlimits_{{i,j = 1}}^{N} d (overrightarrow {x} _{i} ,overrightarrow {x} _{j} )sumlimits_{{a = 1}}^{k} {q_{i}^{a} } q_{j}^{a} .{text{ }} $$
(1)
Additionally, one must impose the constraint that every point falls into one and only one cluster, i.e.,
$$begin{aligned} sum _{a=1}^k q^a_i = 1 ~~ forall i. end{aligned}$$
(2)
The bit configuration optimizing Eq. (1) under the above constraint provides a solution to the clustering of the data. As explained in Ref.4, this can be rephrased naturally as a Quadratic Binary Optimization Problem (QUBO) of (k times N) bit variables, so that it can be solved by a quantum annealer. However, on a gate-based quantum computer, we can use a Variational Quantum Eigensolver (VQE)7 with fewer qubits as follows. Let us call (f^a_i equiv |langle psi _i | psi ^a rangle |^2) the fidelity between a variational quantum state (vert psi _i rangle ) for datapoint (overrightarrow {x}_i) and a reference state (vert psi ^a rangle ) for cluster a. In a VQE algorithm, we could just sample terms (h_{ij}^a),
$$begin{aligned} h_{ij}^a = d(overrightarrow {x}_i,overrightarrow {x}_j) f_i^a f_j^a, end{aligned}$$
(3)
for all datapoints i,j and clusters a, together with penalty terms (c_i),
$$begin{aligned} c_{i} = left( sum _{a=1}^k f_i^a - 1right) ^2, end{aligned}$$
(4)
which are taken into account via Lagrange multipliers for all datapoints i. This last term must only be taken into account if several configurations of the qubits forming the VQE circuit allow for multiple clusters a simultaneously for the same datapoint, e.g., if we codified one qubit per cluster as in Eq. (1).
Our approach here, though, is not to relate the number of qubits to the number of clusters. Instead, we work with some set of predefined states (vert psi ^a rangle in {mathcal {H}}), not necessarily orthogonal, and being ({mathcal {H}}) whichever Hilbert space being used for the VQE. This provides us with enormous flexibility when designing the algorithm. For instance, we could choose states (vert psi ^a rangle ) to be a set of maximally mutually-orthogonal states2 in ({mathcal {H}}). In the particular case of one qubit only, we would then have ({mathcal {H}} = {{mathbb {C}}}^2) and the set of maximally-orthogonal states would correspond to the k vertices of a platonic solid inscribed within the Bloch sphere. The corresponding VQE approach would then correspond to a simple quantum circuit of just one qubit involving the fine-tuning of a single one-qubit rotation, and no sampling of the constraints in Eq. (4) would be needed at all, since this would be satisfied by construction. And for more qubits, the corresponding generalization would involve interesting entangled states in ({mathcal {H}}).
In addition to this, the terms to be sampled can be further refined to improve algorithmic performance. One can for instance introduce modified cost functions, such as
$$begin{aligned} h_{ij}^a= & {} d(overrightarrow {x}_i,overrightarrow {x}_j)^{-1} left( 1 - f_i^a f_j^a right) end{aligned}$$
(5)
$$begin{aligned} h_{ij}^a= & {} left( d(overrightarrow {x}_i,overrightarrow {x}_j)^alpha + lambda d(overrightarrow {x}_i,overrightarrow {c}_i)right) f_i^a f_j^a end{aligned}$$
(6)
$$begin{aligned} h_{ij}^a= & {} left( d(overrightarrow {x}_i,overrightarrow {x}_j)^alpha + lambda d(overrightarrow {x}_i,overrightarrow {c}_i)right) left( 1-f_i^aright) left( 1- f_j^a right) . end{aligned}$$
(7)
In the above cost functions, the first one tends to aggregate together in the same cluster those datapoints that are separated by a short distance, which is the complementary view to the original cost function in Eq. (3). The second one includes two regularization hyperparameters (alpha ) and (lambda ), where (alpha ) allows for modified penalizations for the distances between points, and (lambda ) accounts for the relative importance of the distance between datapoint (overrightarrow {x}_i) and the centroid formed by the elements belonging to the same cluster than point i, which we call (overrightarrow {c}_i). This centroid can be re-calculated self-consistently throughout the running of the algorithm. Additionally, one can consider cost functions with a different philosophy, such as the third one, where datapoints with a large separation distance tend to be either in different clusters, but not ruling our the chance of being in the same cluster. On top of all these possibilities, one could also consider combining them in a suitable way to build even more plausible cost functions. Eventually, the goodness of a cost function depends on the actual dataset, so for each particular case it is worth trying several of them.
The rest of the algorithm follows the standards in unsupervised learning. After a preprocessing of the data (e.g., normalization), we define the suitable set of states (vert psi _a rangle ) and set the characteristics of the variational quantum circuit, including the parameters to be optimized. We set them the classical optimizer for the VQE loop (e.g., Adam optimizer) and its main features (learning rate, batch size, etc.). After initialization, and if needed, we compute the centroids (overrightarrow {c}_i) and distances (d(overrightarrow {x}_i, overrightarrow {x}_j), d(overrightarrow {x}_i, overrightarrow {c}_i)). We then perform the VQE optimization loop for a fixed number of epochs, where new parameters of the variational quantum circuit are computed at each epoch. To accelerate performance in the VQE loop, one can include only in the sampling those terms that have a non-negligible contribution. The final step involves estimating, for a given datapoint, the cluster to which it belongs. This can be done implementing quantum state tomography (either classical or quantum), so that we can read out the final state (vert psi _i rangle ) for a given datapoint (overrightarrow {x}_i), and determine to which cluster it belongs by looking for the maximum of fidelities (f_i^a) for all clusters a.
View original post here:
Variational quantum and quantum-inspired clustering | Scientific ... - Nature.com