7 juin 2018, 11h00–12h15
Toulouse
Salle MC 204
MAD-Stat. Seminar
Résumé
The problem of variable clustering is that of grouping similar components of a $p$-dimensional vector $X = (X_1 , \ldots , X_p)$, and estimating these groups from $n$ independent copies of $X$. Although K-means is a natural strategy for this problem, I will explain why it cannot lead to perfect cluster recovery. Then, I will introduce a correction that can be viewed as a penalized convex relaxation of K-means. The clusters estimated by this method are shown to recover the partition $G$ at a minimax optimal cluster separation rate. Along the way, I will discuss some connections with graph clustering problems.