Working paper

ICS for complex data with application to outlier detection for density data

Camille Mondon, Thi-Huong Trinh, Anne Ruiz-Gazen, and Christine Thomas-Agnan

Abstract

Invariant coordinate selection (ICS) is a dimension reduction method, used as a preliminary step for clustering and outlier detection. It has been primarily applied to multivariate data. This work introduces a coordinate-free definition of ICS in an abstract Euclidean space and extends the method to complex data. Functional and distributional data are preprocessed into a finite-dimensional subspace. For example, in the framework of Bayes Hilbert spaces, distributional data are smoothed into compositional spline functions through the Maximum Penalised Likelihood method. We describe an outlier detection procedure for complex data and study the impact of some preprocessing parameters on the results. We compare our approach with other outlier detection methods through simulations, producing promising results in scenarios with a low proportion of outliers. ICS allows detecting abnormal climate events in a sample of daily maximum temperature distributions recorded across the provinces of Northern Vietnam between 1987 and 2016.

Keywords

Bayes spaces; Distributional data; Extreme weather; Functional data; Invariant coordinate selection; Outlier detection; Temperature distribution;

Reference

Camille Mondon, Thi-Huong Trinh, Anne Ruiz-Gazen, and Christine Thomas-Agnan, ICS for complex data with application to outlier detection for density data, TSE Working Paper, n. 24-1585, October 2024, revised May 2025.

Published in

TSE Working Paper, n. 24-1585, October 2024, revised May 2025