Data normalisation methods on microarray data
Article Sidebar
Issue Vol. 22 No. 2 (2026)
-
Path planning in swarm robotics exploration using SARSA and ACO algorithms
Aicha HAFID, Riadh HOCINE, Lahcene GUEZOULI1-15
-
Detection of suspicious facial objects in neutral ATMs using deep learning architectures based on YOLOV8 and Faster R-CNN
Marco Manuel ARAGON PAUCAR, Kelvin Yhonson FERNANDEZ ACERO, Erasmo SULLA ESPINOZA16-32
-
Assessing the effectiveness of one-stage and two-stage methods for identifying high-voltage power grid equipment in UAV imagery
Thi Thanh Tan NGUYEN, Thi Thu Nga VU33-47
-
An automatic speech recognition approach for controlled medications prescription with natural language processing
Luis Enrique COLMENARES-GUILLÉN, Angel Axel MÉNDEZ-MENESES48-66
-
Improving image retrieval using CNN with PCA and Optimized K-Means clustering
Mohsin Hasan HUSSEIN, Ali Mohsin Ahmed AL-SABAAWI, Zakaria A. Hamed ALNAISH67-84
-
Numerical investigation into the hydrodynamic characteristics of water vortex turbines with varied blade angles
Sarwo EDHY SOFYAN, Zamzami, Akhyar AKHYAR, Suriadi, Agus SASMITO85-104
-
Optimization of the corporate cluster structure using the Tabu Search method
Andrzej IMIEŁOWSKI, Łukasz BANAŚ, Bogusław TWARÓG, Janusz BYTNAR105-116
-
Application controls audit framework in the context of ERP systems
Sakchai TANGPRASERT, Nalinpat BHUMPENPEIN117-125
-
Autonomous AI agents in digital markets: Economic implications for competition, pricing, and regulation
Elmira KYDYRBAYEVA, Balhiya SHOMSHEKOVA, Asset ABZHAKOV, Ainur ASHIMOVA, Assel NURTAYEVA126-137
-
Multi-criteria analysis of parameter impact in large-scale robotic 3D printing
Łukasz SOBASZEK, Ivan GAJDOŠ, Pavol ŠTEFČÁK138-147
-
Designing cloud-based knowledge management systems to improve organizational innovation
Hayfaa Subhi MALALLAH, Sherzad Mohammad AJEEL148-168
-
Data normalisation methods on microarray data
Inggih PERNAMA, Shir Li WANG, Hoi Yeh LEE, Suliana SULAIMAN, Hasnatul Nazuha HASSAN169-179
-
Log-based learning analytics of gamified Moodle activities: Quantifying student engagement
Iva GRUBJEŠIĆ, Tomislav IVANJKO, Vedran JURIČIĆ180-192
-
SFAB-Net: Semantic segmentation network for railway track surface defects based on Spatial Fusion and Adaptive Bottleneck feature enhancement
Qike WU, Sharafiz ABDUL RAHIM, Sai Hong TANG, Muhammad Azim AZIZI, Li ZHANG193-207
-
Machine learning approach to detect GAI-disguised academic programming plagiarism
Oscar KARNALIM, Yehezkiel David SETIAWAN, Maresha Caroline WIJANTO, Rossevine Artha NATHASYA208-224
Archives
-
Vol. 22 No. 2
2026-06-30 15
-
Vol. 22 No. 1
2026-03-31 15
-
Vol. 21 No. 4
2025-12-31 12
-
Vol. 21 No. 3
2025-09-30 12
-
Vol. 21 No. 2
2025-06-30 12
-
Vol. 21 No. 1
2025-03-31 12
-
Vol. 20 No. 4
2024-12-31 12
-
Vol. 20 No. 3
2024-09-30 12
-
Vol. 20 No. 2
2024-06-30 12
-
Vol. 20 No. 1
2024-03-30 12
-
Vol. 19 No. 4
2023-12-31 10
-
Vol. 19 No. 3
2023-09-30 10
-
Vol. 19 No. 2
2023-06-30 10
-
Vol. 19 No. 1
2023-03-31 10
-
Vol. 18 No. 4
2022-12-30 8
-
Vol. 18 No. 3
2022-09-30 8
-
Vol. 18 No. 2
2022-06-30 8
-
Vol. 18 No. 1
2022-03-31 8
Main Article Content
Authors
Abstract
Data normalisation is a critical preprocessing step for machine learning, especially for high-dimensional, small-sample datasets such as those encountered in microarray analysis. This study comprehensively investigates the impact of eight distinct normalisation methods, including Vector Normalisation (L2 Normalisation), Quantile Normalisation (Gaussian and Uniform), Maximum Absolute Scaling, Z-score, Min-Max, Power Transformation, and Robust Scaling, on the classification performance of microarray data. Using an Extreme Learning Machine (ELM) as the classifier, the research evaluates performance across three leukaemia datasets with varying numbers of classes, namely 2, 3 and 4 classes. The results demonstrate that Vector Normalisation consistently outperforms all other methods. In the 2-class scenario, it achieved the highest accuracy (87.50%) and F1-score (87.08%). Although unnormalised data showed a similar average accuracy, Vector Normalisation proved empirically superior due to its significantly lower standard deviation, which is10.89, indicating a more stable and reproducible model. This stability became even more critical in the 3-class scenario, where overall performance declined, but Vector Normalisation still led with 61.67% accuracy and 52.18% F1-score, while other methods, particularly simple scaling techniques like Min-Max, showed a sharp drop and extreme instability. In the 4-class scenario, a performance rebound occurred, and Vector Normalisation maintained its top position, achieving 72.92% accuracy and an F1-score of 66.17%. The findings confirm that Vector Normalisation is the most effective normalisation method for microarray data, delivering both high performance and superior stability across varying levels of class complexity.
Keywords:
Sustainable Development Goals (SDG)
- 3 - Good health and well-being
References
Ahmed, H. A., Ali, P. J. M., Faeq, A. K., & Abdullah, S. M. (2022). An investigation on disparity responds of machine learning algorithms to data normalization method. ARO-The Scientific Journal of Koya University, 10(2), 29–37. https://doi.org/10.14500/aro.10970
Ahsan, M. M., Mahmud, M. P., Saha, P. K., Gupta, K. D., & Siddique, Z. (2021). Effect of data scaling methods on machine learning algorithms and model performance. Technologies, 9(3), Article 52. https://doi.org/10.3390/technologies9030052
Baliarsingh, S. K., Dora, C., & Vipsita, S. (2021). Jaya optimized extreme learning machine for breast cancer data classification. In D. Mishra, R. Buyya, P. Mohapatra, & S. Patnaik (Eds.), Intelligent and Cloud Computing. Smart Innovation, Systems and Technologies (Vol. 153, pp. 459–467). Springer. https://doi.org/10.1007/978-981-15-6202-0_47
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J. M., & Herrera, F. (2014). A review of microarray datasets and applied feature selection methods. Information Sciences, 282, 111–135. https://doi.org/10.1016/j.ins.2014.05.042
Borkin, D., Némethová, A., Michaľčonok, G., & Maiorov, K. (2019). Impact of data normalization on classification model accuracy. Vedecké Práce Materiálovotechnologickej Fakulty Slovenskej Technickej Univerzity v Bratislave so Sídlom v Trnave, 27(45), 79–84. https://doi.org/10.2478/rput-2019-0029
Buša, J., & Poļaka, I. (2021). Variability of classification results in data with high dimensionality and small sample size. Information Technology & Management Science, 24, 28–34. https://doi.org/10.7250/itms-2021-0007
Çelik, A. (2024). Evaluating the impact of data normalization on rice classification using machine learning algorithms. Acadlore Transactions on AI and Machine Learning, 3(3), 162–171. https://doi.org/10.56578/ataiml030303
Daoud, M., & Mayo, M. (2019). A survey of neural network-based cancer prediction models from microarray data. Artificial Intelligence in Medicine, 97, 204–214. https://doi.org/10.1016/j.artmed.2019.01.006
De Amorim, L. B., Cavalcanti, G. D., & Cruz, R. M. (2023). The choice of scaling technique matters for classification performance. Applied Soft Computing, 133, Article 109924. https://doi.org/10.1016/j.asoc.2022.109924
Demircioğlu, A. (2024). The effect of feature normalization methods in radiomics. Insights Into Imaging, 15(1), Article 2. https://doi.org/10.1186/s13244-023-01575-7
Deng, S., Li, Y., Wang, J., Cao, R., & Li, M. (2023). A feature-thresholds guided genetic algorithm based on a multi-objective feature scoring method for high-dimensional feature selection. Applied Soft Computing, 148, Article 110765. https://doi.org/10.1016/j.asoc.2023.110765
Henderi, H., Wahyuningsih, T., & Rahwanto, E. (2021). Comparison of Min-Max normalization and Z-Score normalization in the K-nearest neighbor (kNN) algorithm to test the accuracy of types of breast cancer. International Journal of Informatics and Information Systems, 4(1), 13–20. https://doi.org/10.47738/ijiis.v4i1.73
Hira, S., & Bai, A. (2022). A novel map reduced based parallel feature selection and extreme learning for micro array cancer data classification. Wireless Personal Communications, 123(2), 1483–1505. https://doi.org/10.1007/s11277-021-09196-3
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1-3), 489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Huang, G., Huang, G.-B., Song, S., & You, K. (2015). Trends in extreme learning machines: A review. Neural Networks, 61, 32–48. https://doi.org/10.1016/j.neunet.2014.10.001
Lima, F. T., & Souza, V. M. (2023). A large comparison of normalization methods on time series. Big Data Research, 34, Article 100407. https://doi.org/10.1016/j.bdr.2023.100407
Nagra, A. A., Khan, A. H., Abubakar, M., Faheem, M., Rasool, A., Masood, K., & Hussain, M. (2024). A gene selection algorithm for microarray cancer classification using an improved particle swarm optimization. Scientific Reports, 14(1), Article 19613. https://doi.org/10.1038/s41598-024-68744-6
Panda, P., Bisoy, S. K., Panigrahi, A., Pati, A., Sahu, B., Guo, Z., & Jain, P. (2025). BIMSSA: Enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches. Frontiers in Genetics, 15, Article 1491602. https://doi.org/10.3389/fgene.2024.1491602
Permana, I., Wang, S. L., Salisah, F. N., Sanusi, & Yanto, F. (2025). A benchmark of activation functions in extreme learning machine for high-dimensional low-sample-size microarray classification. Jurnal Inotera, 10(2), 502–515. https://doi.org/10.31572/inotera.Vol10.Iss2.2025.ID593
Salisah, F. N., Permana, I., & Wang, S. L. (2025). Evaluating the impact of data balancing techniques on the k-nearest neighbors algorithm for microarray data classification. Jurnal Inotera, 10(2), 261–271. https://doi.org/10.31572/inotera.Vol10.Iss2.2025.ID497
Sucharita, S., Sahu, B., Swarnkar, T., & Meher, S. K. (2024). Classification of cancer microarray data using a two-step feature selection framework with moth-flame optimization and extreme learning machine. Multimedia Tools and Applications, 83(7), 21319–21346. https://doi.org/10.1007/s11042-023-16353-2
Sun, J., Cao, X., Liang, H., Huang, W., Chen, Z., & Li, Z. (2020). New interpretations of normalization methods in deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 4, pp. 5875–5882). https://doi.org/10.1609/aaai.v34i04.6046
Tripathy, J., Dash, R., & Pattanayak, B. K. (2024). Unleashing the power of machine learning in cancer analysis: A novel gene selection and classifier ensemble strategy. Research on Biomedical Engineering, 40(1), 125–137. https://doi.org/10.1007/s42600-023-00335-2
Tripathy, J., Dash, R., Pattanayak, B. K., & Mishra, S. K. (2025). An integrated ELM based feature reduction combination detection for gene expression data analysis. SN Computer Science, 6(1), Article 65. https://doi.org/10.1007/s42979-024-03612-8
Vargas, V. M., Guijo-Rubio, D., Gutiérrez, P. A., & Hervás-Martínez, C. (2021). Relu-based activations: Analysis and experimental study for deep learning. In Advances in Artificial Intelligence: 19th Conference of the Spanish Association for Artificial Intelligence (CAEPIA) (pp. 33–43). Springer. https://doi.org/10.1007/978-3-030-85713-4_4
Wang, J., Lu, S., Wang, S.-H., & Zhang, Y.-D. (2022). A review on extreme learning machine. Multimedia Tools and Applications, 81(29), 41611–41660. https://doi.org/10.1007/s11042-021-11007-7
Zhu, X. (2018). Microarray datasets for feature selection [Data set]. Shenzhen University, College of Computer Science and Software Engineering. http://csse.szu.edu.cn/staff/zhuzx/Datasets.html
Zhu, Z., Ong, Y.-S., & Dash, M. (2007). Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognition, 40(11), 3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007
Article Details
Abstract views: 34
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
