Aurore ARCHIMBAUD's PhD defense January 26th

January 26, 2018 Research

Aurore ARCHIMBAUD will defend her thesis on Mathematics "Statistical Methods for Outlier Detection for High-Dimensional Data" on Friday 26th January 2018, room MF 323, 3:00 PM.

Supervisor: Anne RUIZGAZEN TSE researcher, UT1 Capitole.

Memberships are:

  • M. Jérôme SARACCO University of Bordeaux
  • M.Klaus NORDHAUSEN – University of Technology 
  • Mme Julie JOSSE -   Ecole Polytechnique
  • Mme Béatrice LAURENTBONNEAU - INSA 
  • M. Valentin TODOROV – Statistics Division of UNIDO
  • M. Andrea CERIOLI Università degli studi di Parma Parma
  • M. François BERGERET – IPPON INNOVATION
  • Mme Carole SOUAL


Abstract:

The unsupervised outlier detection is a crucial issue in statistics. More specifically, in the industrial context of fault detection, this task is of great importance for ensuring a high quality production. With the exponential increase in the number of measurements on electronic components, the concern of high dimensional data arises in the identification of outlying observations. The ippon innovation company, an expert in industrial statistics and anomaly detection, wanted to deal with this new situation. So, it collaborated with the TSE-R research laboratory by financing this thesis work.

The first chapter presents the quality control context and the different procedures mainly used in the automotive industry of semiconductors. However, these practices do not meet the new expectations required in dealing with high dimensional data, so other solutions need to be considered. The remainder of the chapter summarizes unsupervised multivariate methods for outlier detection, with a particular emphasis on those dealing with high dimensional data.

Chapter 2 demonstrates that the well-known Mahalanobis distance presents some difficulties to detect the outlying observations that lie in a smaller subspace
while the number of variables is large. In this context, the Invariant Coordinate Selection (ICS) method is introduced as an interesting alternative for highlighting the structure of outlierness. A methodology for selecting only the relevant components is proposed. A simulation study provides a comparison with benchmark methods. The performance of our proposal is also evaluated on real industrial data sets.

This new procedure has been implemented in an R package, ICSOutlier, presented in Chapter 3, and in an R shiny application (package ICSShiny) that makes it more user-friendly. When the number of dimensions increases, the multivariate scatter matrices turn out to be singular as soon as some variables are collinear or if their number exceeds the number of individuals. However, in the presentation of ICS by Tyler et al. (2009), the scatter estimators are defined as positive definite matrices.

Chapter 4 proposes three different ways for adapting the ICS method to singular scatter matrices and theoretically investigates their properties. The question of affine invariance is analyzed in particular. Finally, the last chapter is dedicated to the algorithm developed for the company. Although the algorithm is confidential, the chapter presents the main ideas and the challenges, mostly numerical, encountered during its development.