AlgorithmsHere we explain the setup and algorithm when the data points reside on linear subspaces, the algorithm can also cluster affine subspaces with simple modifications (see Section 2.2 of Robust Subspace Clustering for further detail). Setup and notationpoints in dimensions normalized to have unit Euclidean norm . Noisy problem
Missing data problem
General StructureThe general structure of the algorithm follows the standard machine:
Following Sparse Subspace Clustering (SSC) of Elhamifar and Vidal, we use ideas from sparse representation theory and sparse regression to compute affinities. Sparse regression We first, calculate a sparse coefficient sequence obtained by regressing the th data point onto all other data points . The hope is that such a sparse representation of would only select vectors from the cluster in which belongs to. This is depicted in the figure below with each color denoting points from a different cluster. We propose two strategies for the sparse regression step below.
Building a graph based on the sparse coefficients We set After applying a permutation which makes sure that columns in the same cluster are contiguous we expect the affinity matrix to look like the figure below where each block corresponds to a different cluster. Two Step Procedure with Data Driven RegularizationFor We choose in a data driven fashion: Choice of parameters: . Bias-corrected Dantzig SelectorAgain regress one against the others this time by Dantzig Selector. subject to . and . In practice we only get to observe the noisy versions and . Solution: Build unbiased estimators for and . Set Finally, we arrive at at the following sequence of optimization problems. subject to . Choice of parameters: . Bias-corrected Dantzig Selector for missing datadenotes the observations from the -th column of the clean data matrix denotes the submatrix of with rows selected by . Again regress one against the others this time by Dantzig Selector. subject to . and . In practice we only get to observe a few entries. Solution: Again build unbiased estimators for and . Set We build a matrix based on the observed entries of as follows. Set if observed and if missing. Set with the th row and column set to zero. Set with the th row set to zero. Finally, we arrive at subject to . Choice of parameters: , where number of missing entries/total number of entries. |