Séminaire

Explainable anomaly detection using data depth

Pavlo Mozharovzkyi (Télécom ParisTech)

16 octobre 2025, 11h00–12h15

Toulouse

Salle Auditorium 4

MAD-Stat. Seminar

Résumé

Anomaly detection is a branch of data analysis and machine learning which aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification, isolation and explanation constitute an important task in almost any branch of science and industry. By providing a robust ordering, data depth - statistical function that measures belongingness of any point of the space to a data set - becomes a particularly useful tool for detection of anomalies. Already known for its theoretical properties, data depth has undergone substantial computational developments in the last decade and particularly recent years, which has made it applicable for contemporary-sized problems of data analysis and machine learning. We study data depth as an efficient anomaly detection tool, assigning abnormality labels to observations with lower depth values, in a multivariate setting. Practical questions of necessity and reasonability of invariances and shape of the depth function, their robustness and computational complexity, choice of the threshold are discussed. Furthermore, we introduce a new statistical tool dedicated for exploratory analysis of abnormal observations using data depth as a score. Abnormal component analysis (shortly ACA) is a method that searches a low-dimensional data representation which best visualises and explains anomalies. This low-dimensional representation not only allows to distinguish groups of anomalies better than the methods of the state of the art, but as well provides a -- linear in variables and thus easily interpretable -- explanation for anomalies. Illustrations include use-cases that underline advantageous behaviour of data depth and of the explainable anomaly detection, in various settings. (Joint with Romain Valla, Florence d'Alché-Buc)