Data normalisation methods on microarray data

Main Article Content

Inggih PERNAMA

inggihpermana@uin-suska.ac.id

Shir Li WANG

shirli_wang@meta.upsi.edu.my

Hoi Yeh LEE

leehoiyeh@fskik.upsi.edu.my

Suliana SULAIMAN

suliana@meta.upsi.edu.my

Hasnatul Nazuha HASSAN

nazuha@meta.upsi.edu.my

Abstract

Data normalisation is a critical preprocessing step for machine learning, especially for high-dimensional, small-sample datasets such as those encountered in microarray analysis. This study comprehensively investigates the impact of eight distinct normalisation methods, including Vector Normalisation (L2 Normalisation), Quantile Normalisation (Gaussian and Uniform), Maximum Absolute Scaling, Z-score, Min-Max, Power Transformation, and Robust Scaling, on the classification performance of microarray data. Using an Extreme Learning Machine (ELM) as the classifier, the research evaluates performance across three leukaemia datasets with varying numbers of classes, namely 2, 3 and 4 classes. The results demonstrate that Vector Normalisation consistently outperforms all other methods. In the 2-class scenario, it achieved the highest accuracy (87.50%) and F1-score (87.08%). Although unnormalised data showed a similar average accuracy, Vector Normalisation proved empirically superior due to its significantly lower standard deviation, which is10.89, indicating a more stable and reproducible model. This stability became even more critical in the 3-class scenario, where overall performance declined, but Vector Normalisation still led with 61.67% accuracy and 52.18% F1-score, while other methods, particularly simple scaling techniques like Min-Max, showed a sharp drop and extreme instability. In the 4-class scenario, a performance rebound occurred, and Vector Normalisation maintained its top position, achieving 72.92% accuracy and an F1-score of 66.17%. The findings confirm that Vector Normalisation is the most effective normalisation method for microarray data, delivering both high performance and superior stability across varying levels of class complexity.

Keywords:

classification, data normalisation, extreme learning machine, microarray data

Sustainable Development Goals (SDG)

  • 3 - Good health and well-being

References

Article Details

PERNAMA, I., WANG, S. L., LEE, H. Y., SULAIMAN, S., & HASSAN, H. N. (2026). Data normalisation methods on microarray data. Applied Computer Science, 22(2), 169–179. https://doi.org/10.35784/acs_9078