Please use this identifier to cite or link to this item: http://hdl.handle.net/10532/7473
Title: A critical analysis of Adaptive Box-Cox transformation for skewed distributed data management: metabolomics of Spanish and Argentinian truffles as a case study
Authors: Sibono, Leonardo
Grosso, Massimiliano
Tejedor Calvo, Raquel
Casula, Mattia
Marco Montori, Pedro
García Barreda, Sergi
Manis, Cristina
Caboni, Pierluigi
Issue Date: 2025
Citation: Sibono,L.; Grosso, M.; Tejedor-Calvo, E.; Casula, M-; Marco, P.; Garcia Barreda, S.; Manis, C.; Caboni, P. A critical analysis of Adaptive Box-Cox transformation for skewed distributed data management: metabolomics of Spanish and Argentinian truffles as a case study, Analytica Chimica Acta, 2025, 343704
Abstract: Background Metabolic variations retrieved in metabolomic data are considered a benchmark for detecting biomatrix variability. Therefore, identifying target metabolites is crucial to keep track of any substrate modification and preserve it from any undesired alteration. Unfortunately, such a task can be negatively affected by detecting false positives, often triggered by complicated data distributions. In this work, we undertook an investigation of the metabolic profile of Spanish and Argentine truffles using a robust methodology. The issue of skewed data distributions has been effectively addressed through a normalisation preprocessing, enhancing biomarker identification and samples classification. Results A data normality-improved parametric test (ANOVA) was employed to define the target metabolites, which significantly vary between two regions of origin: Spain and Argentina. Specifically, Adaptive Box-Cox transformation was employed to improve the ANOVA test's performance so that data distributions were fitted to a Gaussian variable. Using the Bonferroni-Holm method for false discovery rate correction, we demonstrated the effectiveness of this transformation for the case under investigation. Results were compared with two non-parametric tests (Kruskall-Wallis and Permutation test), selected as a reference methodology, to provide a better understanding of non-normal distributions often encountered in metabolomic data analysis. 17 metabolites out of the 57 investigated metabolites exhibited notable variability across the two geographical regions. The validity of this methodology was supported through the discrimination of samples belonging to different groups. In this regard, both univariate and multivariate statistical models were tested through Monte Carlo simulations and yielded consistent results. Significance data analysis outcomes are sensitive to variables distributions. The present study shows an effective tool to increase data normality, thereby enhancing the statistical power for biomarker discovery and improving models’ classification performances. These results find justification from the current knowledge within the field of food sciences, enabling their application in advancing research in the truffle analysis domain.
URI: http://hdl.handle.net/10532/7473
Related document: https://doi.org/10.1016/j.aca.2025.343704
ISSN: 00032670
License: http://creativecommons.org/licenses/by-nc-nd/3.0/es/
Appears in Collections:[DOCIART] Artículos científicos, técnicos y divulgativos

Files in This Item:
File Description SizeFormat 
10128743.pdf1,75 MBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons

La información de este repositorio es indexada en: