Data normalisation methods on microarray data

Inggih PERMANA; Shir Li WANG; Hoi Yeh LEE; Suliana SULAIMAN; Hasnatul Nazuha HASSAN

doi:10.35784/acs_9078

PDF

Published: Jun 30, 2026

DOI: https://doi.org/10.35784/acs_9078

Issue Vol. 22 No. 2 (2026)

Articles

Path planning in swarm robotics exploration using SARSA and ACO algorithms
Aicha HAFID, Riadh HOCINE, Lahcene GUEZOULI

1-15
Detection of suspicious facial objects in neutral ATMs using deep learning architectures based on YOLOV8 and Faster R-CNN
Marco Manuel ARAGON PAUCAR, Kelvin Yhonson FERNANDEZ ACERO, Erasmo SULLA ESPINOZA

16-32
Assessing the effectiveness of one-stage and two-stage methods for identifying high-voltage power grid equipment in UAV imagery
Thi Thanh Tan NGUYEN, Thi Thu Nga VU

33-47
An automatic speech recognition approach for controlled medications prescription with natural language processing
Luis Enrique COLMENARES-GUILLÉN, Angel Axel MÉNDEZ-MENESES

48-66
Improving image retrieval using CNN with PCA and Optimized K-Means clustering
Mohsin Hasan HUSSEIN, Ali Mohsin Ahmed AL-SABAAWI, Zakaria A. Hamed ALNAISH

67-84
Numerical investigation into the hydrodynamic characteristics of water vortex turbines with varied blade angles
Sarwo EDHY SOFYAN, Zamzami, Akhyar AKHYAR, Suriadi, Agus SASMITO

85-104
Optimization of the corporate cluster structure using the Tabu Search method
Andrzej IMIEŁOWSKI, Łukasz BANAŚ, Bogusław TWARÓG, Janusz BYTNAR

105-116
Application controls audit framework in the context of ERP systems
Sakchai TANGPRASERT, Nalinpat BHUMPENPEIN

117-125
Autonomous AI agents in digital markets: Economic implications for competition, pricing, and regulation
Elmira KYDYRBAYEVA, Balhiya SHOMSHEKOVA, Asset ABZHAKOV, Ainur ASHIMOVA, Assel NURTAYEVA

126-137
Multi-criteria analysis of parameter impact in large-scale robotic 3D printing
Łukasz SOBASZEK, Ivan GAJDOŠ, Pavol ŠTEFČÁK

138-147
Designing cloud-based knowledge management systems to improve organizational innovation
Hayfaa Subhi MALALLAH, Sherzad Mohammad AJEEL

148-168
Data normalisation methods on microarray data
Inggih PERMANA, Shir Li WANG, Hoi Yeh LEE, Suliana SULAIMAN, Hasnatul Nazuha HASSAN

169-179
Log-based learning analytics of gamified Moodle activities: Quantifying student engagement
Iva GRUBJEŠIĆ, Tomislav IVANJKO, Vedran JURIČIĆ

180-192
SFAB-Net: Semantic segmentation network for railway track surface defects based on Spatial Fusion and Adaptive Bottleneck feature enhancement
Qike WU, Sharafiz ABDUL RAHIM, Sai Hong TANG, Muhammad Azim AZIZI, Li ZHANG

193-207
Machine learning approach to detect GAI-disguised academic programming plagiarism
Oscar KARNALIM, Yehezkiel David SETIAWAN, Maresha Caroline WIJANTO, Rossevine Artha NATHASYA

208-224

Authors

Inggih PERMANA

inggihpermana@uin-suska.ac.id

Universiti Pendidikan Sultan Idris, Malaysia

https://orcid.org/0000-0001-7750-351X

Shir Li WANG

shirli_wang@meta.upsi.edu.my

Universiti Pendidikan Sultan Idris, Malaysia

https://orcid.org/0000-0003-4417-3213

Hoi Yeh LEE

leehoiyeh@fskik.upsi.edu.my

Universiti Pendidikan Sultan Idris, Malaysia

https://orcid.org/0000-0002-2217-7005

Suliana SULAIMAN

suliana@meta.upsi.edu.my

Universiti Pendidikan Sultan Idris, Malaysia

https://orcid.org/0000-0002-2440-8831

Hasnatul Nazuha HASSAN

nazuha@meta.upsi.edu.my

Universiti Pendidikan Sultan Idris, Malaysia

https://orcid.org/0009-0005-4365-2572

Abstract

Data normalisation is a critical preprocessing step for machine learning, especially for high-dimensional, small-sample datasets such as those encountered in microarray analysis. This study comprehensively investigates the impact of eight distinct normalisation methods, including Vector Normalisation (L2 Normalisation), Quantile Normalisation (Gaussian and Uniform), Maximum Absolute Scaling, Z-score, Min-Max, Power Transformation, and Robust Scaling, on the classification performance of microarray data. Using an Extreme Learning Machine (ELM) as the classifier, the research evaluates performance across three leukaemia datasets with varying numbers of classes, namely 2, 3 and 4 classes. The results demonstrate that Vector Normalisation consistently outperforms all other methods. In the 2-class scenario, it achieved the highest accuracy (87.50%) and F1-score (87.08%). Although unnormalised data showed a similar average accuracy, Vector Normalisation proved empirically superior due to its significantly lower standard deviation, which is10.89, indicating a more stable and reproducible model. This stability became even more critical in the 3-class scenario, where overall performance declined, but Vector Normalisation still led with 61.67% accuracy and 52.18% F1-score, while other methods, particularly simple scaling techniques like Min-Max, showed a sharp drop and extreme instability. In the 4-class scenario, a performance rebound occurred, and Vector Normalisation maintained its top position, achieving 72.92% accuracy and an F1-score of 66.17%. The findings confirm that Vector Normalisation is the most effective normalisation method for microarray data, delivering both high performance and superior stability across varying levels of class complexity.

Keywords:

classification, data normalisation, extreme learning machine, microarray data

Sustainable Development Goals (SDG)

3 - Good health and well-being

References

Ahmed, H. A., Ali, P. J. M., Faeq, A. K., & Abdullah, S. M. (2022). An investigation on disparity responds of machine learning algorithms to data normalization method. ARO-The Scientific Journal of Koya University, 10(2), 29–37. https://doi.org/10.14500/aro.10970

Ahsan, M. M., Mahmud, M. P., Saha, P. K., Gupta, K. D., & Siddique, Z. (2021). Effect of data scaling methods on machine learning algorithms and model performance. Technologies, 9(3), Article 52. https://doi.org/10.3390/technologies9030052

Baliarsingh, S. K., Dora, C., & Vipsita, S. (2021). Jaya optimized extreme learning machine for breast cancer data classification. In D. Mishra, R. Buyya, P. Mohapatra, & S. Patnaik (Eds.), Intelligent and Cloud Computing. Smart Innovation, Systems and Technologies (Vol. 153, pp. 459–467). Springer. https://doi.org/10.1007/978-981-15-6202-0_47

Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J. M., & Herrera, F. (2014). A review of microarray datasets and applied feature selection methods. Information Sciences, 282, 111–135. https://doi.org/10.1016/j.ins.2014.05.042

Borkin, D., Némethová, A., Michaľčonok, G., & Maiorov, K. (2019). Impact of data normalization on classification model accuracy. Vedecké Práce Materiálovotechnologickej Fakulty Slovenskej Technickej Univerzity v Bratislave so Sídlom v Trnave, 27(45), 79–84. https://doi.org/10.2478/rput-2019-0029

Buša, J., & Poļaka, I. (2021). Variability of classification results in data with high dimensionality and small sample size. Information Technology & Management Science, 24, 28–34. https://doi.org/10.7250/itms-2021-0007

Çelik, A. (2024). Evaluating the impact of data normalization on rice classification using machine learning algorithms. Acadlore Transactions on AI and Machine Learning, 3(3), 162–171. https://doi.org/10.56578/ataiml030303

Daoud, M., & Mayo, M. (2019). A survey of neural network-based cancer prediction models from microarray data. Artificial Intelligence in Medicine, 97, 204–214. https://doi.org/10.1016/j.artmed.2019.01.006

De Amorim, L. B., Cavalcanti, G. D., & Cruz, R. M. (2023). The choice of scaling technique matters for classification performance. Applied Soft Computing, 133, Article 109924. https://doi.org/10.1016/j.asoc.2022.109924

Demircioğlu, A. (2024). The effect of feature normalization methods in radiomics. Insights Into Imaging, 15(1), Article 2. https://doi.org/10.1186/s13244-023-01575-7

Deng, S., Li, Y., Wang, J., Cao, R., & Li, M. (2023). A feature-thresholds guided genetic algorithm based on a multi-objective feature scoring method for high-dimensional feature selection. Applied Soft Computing, 148, Article 110765. https://doi.org/10.1016/j.asoc.2023.110765

Henderi, H., Wahyuningsih, T., & Rahwanto, E. (2021). Comparison of Min-Max normalization and Z-Score normalization in the K-nearest neighbor (kNN) algorithm to test the accuracy of types of breast cancer. International Journal of Informatics and Information Systems, 4(1), 13–20. https://doi.org/10.47738/ijiis.v4i1.73

Hira, S., & Bai, A. (2022). A novel map reduced based parallel feature selection and extreme learning for micro array cancer data classification. Wireless Personal Communications, 123(2), 1483–1505. https://doi.org/10.1007/s11277-021-09196-3

Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1-3), 489–501. https://doi.org/10.1016/j.neucom.2005.12.126

Huang, G., Huang, G.-B., Song, S., & You, K. (2015). Trends in extreme learning machines: A review. Neural Networks, 61, 32–48. https://doi.org/10.1016/j.neunet.2014.10.001

Lima, F. T., & Souza, V. M. (2023). A large comparison of normalization methods on time series. Big Data Research, 34, Article 100407. https://doi.org/10.1016/j.bdr.2023.100407

Nagra, A. A., Khan, A. H., Abubakar, M., Faheem, M., Rasool, A., Masood, K., & Hussain, M. (2024). A gene selection algorithm for microarray cancer classification using an improved particle swarm optimization. Scientific Reports, 14(1), Article 19613. https://doi.org/10.1038/s41598-024-68744-6

Panda, P., Bisoy, S. K., Panigrahi, A., Pati, A., Sahu, B., Guo, Z., & Jain, P. (2025). BIMSSA: Enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches. Frontiers in Genetics, 15, Article 1491602. https://doi.org/10.3389/fgene.2024.1491602

Permana, I., Wang, S. L., Salisah, F. N., Sanusi, & Yanto, F. (2025). A benchmark of activation functions in extreme learning machine for high-dimensional low-sample-size microarray classification. Jurnal Inotera, 10(2), 502–515. https://doi.org/10.31572/inotera.Vol10.Iss2.2025.ID593

Salisah, F. N., Permana, I., & Wang, S. L. (2025). Evaluating the impact of data balancing techniques on the k-nearest neighbors algorithm for microarray data classification. Jurnal Inotera, 10(2), 261–271. https://doi.org/10.31572/inotera.Vol10.Iss2.2025.ID497

Sucharita, S., Sahu, B., Swarnkar, T., & Meher, S. K. (2024). Classification of cancer microarray data using a two-step feature selection framework with moth-flame optimization and extreme learning machine. Multimedia Tools and Applications, 83(7), 21319–21346. https://doi.org/10.1007/s11042-023-16353-2

Sun, J., Cao, X., Liang, H., Huang, W., Chen, Z., & Li, Z. (2020). New interpretations of normalization methods in deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 4, pp. 5875–5882). https://doi.org/10.1609/aaai.v34i04.6046

Tripathy, J., Dash, R., & Pattanayak, B. K. (2024). Unleashing the power of machine learning in cancer analysis: A novel gene selection and classifier ensemble strategy. Research on Biomedical Engineering, 40(1), 125–137. https://doi.org/10.1007/s42600-023-00335-2

Tripathy, J., Dash, R., Pattanayak, B. K., & Mishra, S. K. (2025). An integrated ELM based feature reduction combination detection for gene expression data analysis. SN Computer Science, 6(1), Article 65. https://doi.org/10.1007/s42979-024-03612-8

Vargas, V. M., Guijo-Rubio, D., Gutiérrez, P. A., & Hervás-Martínez, C. (2021). Relu-based activations: Analysis and experimental study for deep learning. In Advances in Artificial Intelligence: 19th Conference of the Spanish Association for Artificial Intelligence (CAEPIA) (pp. 33–43). Springer. https://doi.org/10.1007/978-3-030-85713-4_4

Wang, J., Lu, S., Wang, S.-H., & Zhang, Y.-D. (2022). A review on extreme learning machine. Multimedia Tools and Applications, 81(29), 41611–41660. https://doi.org/10.1007/s11042-021-11007-7

Zhu, X. (2018). Microarray datasets for feature selection [Data set]. Shenzhen University, College of Computer Science and Software Engineering. http://csse.szu.edu.cn/staff/zhuzx/Datasets.html

Zhu, Z., Ong, Y.-S., & Dash, M. (2007). Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognition, 40(11), 3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007

PERMANA, I., WANG, S. L., LEE, H. Y., SULAIMAN, S., & HASSAN, H. N. (2026). Data normalisation methods on microarray data. Applied Computer Science, 22(2), 169–179. https://doi.org/10.35784/acs_9078

Data normalisation methods on microarray data

Issue Vol. 22 No. 2 (2026)

Archives

Authors

Abstract

Keywords:

Sustainable Development Goals (SDG)

References

License

Article Sidebar

Issue Vol. 22 No. 2 (2026)

Archives

Main Article Content

Authors

Abstract

Keywords:

Sustainable Development Goals (SDG)

References

Article Details

License