Séminaire

Variable clustering: optimal bounds and a convex approach

Nicolas Verzelen (MISTEA - SupAgro Montepellier)

7 juin 2018, 11h00–12h15

Toulouse

Salle MC 204

MAD-Stat. Seminar

Résumé

The problem of variable clustering is that of grouping similar components of a $p$-dimensional vector $X = (X_1 , \ldots , X_p)$, and estimating these groups from $n$ independent copies of $X$. Although K-means is a natural strategy for this problem, I will explain why it cannot lead to perfect cluster recovery. Then, I will introduce a correction that can be viewed as a penalized convex relaxation of K-means. The clusters estimated by this method are shown to recover the partition $G$ at a minimax optimal cluster separation rate. Along the way, I will discuss some connections with graph clustering problems.