June 1, 2010, 14:00–15:30
Toulouse
Room MF 323
Statistics Seminar
Abstract
An effective method for outlier detection should both identify a large portion of the outliers when they are present in the data, and provide a small number of false alarms when there is no contamination. However basic they may seem, these conflicting requirements are the enemy brothers of statistically principled outlier detection rules. We describe a compromise strategy between them in the multivariate framework, when location and scatter are estimated by the Reweighted Minimum Covariance Determinant (RMCD) method. For this purpose, we address two basic issues. First, we describe an approximation to the exact distribution of robust distances from which reliable cut-off values can be obtained even in small samples. Second, we investigate multiplicity issues arising when several outliers are present. We describe how careful choice of the error rate which is controlled during the outlier detection process can yield the required compromise, when alternatives to strong control of the Family Wise Error Rate are considered.