Robust Subspace Clustering


Subspace clustering is the problem of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This arises naturally in fields as diverse as identification and classification of diseases, text categorization, network topology inference, security and privacy in recommender systems, system identification, hyper-spectral imaging, identification of switched linear systems, and music analysis.

Subspace clustering can be regarded as a generalization of PCA in which points do not lie around a single lower-dimensional subspace but rather around a union of subspaces as shown in the picture on the left. It can also be seen as a nonstandard clustering problem in which neighbors are not close according a pre-defined notion of metric but rather belong to the same lower dimensional structure.

This webpage introduces a collection of subspace clustering algorithms, which are tractable and provably robust to various form of data imperfections such as noise, missing data, sparse outliers. Numerical experiments indicate that these algorithms, which are almost parameter free, are effective on a wide variety of data segmentation problems.