Article

lrSVD: An efficient imputation algorithm for incomplete high-throughput compositional data

Javier Palarea-Albaladejo, Josep Antoni Martín-Fernández, Anne Ruiz-Gazen, and Christine Thomas-Agnan

Abstract

Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high-throughput technologies. Within this compositional framework, the focus is put on the relative information conveyed in the data by using log-ratio coordinate representations. However, log-ratios cannot be computed when the data matrix is not complete. A new computationally efficient data imputation algorithm based on compositional principles and aimed at high-throughput continuous-valued compositions is introduced that relies on a constrained low-rank matrix approximation of the data. Simulation and real metabolomics data are used to demonstrate its performance and ability to deal with different forms of incomplete data: zeros, nondetects, missing values or a combination of them. The computer routines lrSVD and lrSVDplus are implemented in the R package zCompositions to facilitate its use by practitioners.

Reference

Javier Palarea-Albaladejo, Josep Antoni Martín-Fernández, Anne Ruiz-Gazen, and Christine Thomas-Agnan, lrSVD: An efficient imputation algorithm for incomplete high-throughput compositional data, Journal of Chemometrics, vol. 36, n. 12, December 2022.

Published in

Journal of Chemometrics, vol. 36, n. 12, December 2022