A comparative analysis of the performance of the relational database and the Hadoop environment in the context of analytical data processing

Michał Zadrąg

michal.zadrag@pollub.edu.pl
Lublin University of Technology (Poland)

Abstract

The article presents a detailed comparative analysis of the performance of a Microsoft SQL Server relational database and an Apache Hadoop environment in the context of analytical data processing. The study was carried out by execut-ing more than a dozen research scenarios with different queries on datasets of varying sizes. For each research scenario, the average query execution time on different datasets was compared. Based on the results, it was found that the average execution time of queries from the presented scenarios is significantly shorter in MS SQL Server than in Apache Ha-doop.


Keywords:

Apache Hadoop, SQL Server, relational database, OLAP

P. O. Queiroz-Sousa, A. C. Salgado, A review on OLAP technologies applied to information networks, ACM Transactions on Knowledge Discovery from Data (TKDD) 14(1) (2019) 1–25, https://doi.org/10.1145/3370912.
DOI: https://doi.org/10.1145/3370912   Google Scholar

S. Sagiroglu, D. Sinanc, Big data: A review, Proceedings of the 2013 international conference on collaboration technologies and systems (CTS), IEEE, San Diego, CA, USA (2013) 42–47, https://doi.org/10.1109/CTS.2013.6567202.
DOI: https://doi.org/10.1109/CTS.2013.6567202   Google Scholar

S. Chaudhuri, U. Dayal, An overview of data warehousing and OLAP technology, ACM Sigmod record 26(1) (1997) 65–74, https://doi.org/10.1145/248603.248616.
DOI: https://doi.org/10.1145/248603.248616   Google Scholar

J. Song, C. Guo, Z. Wang, Y. Zhang, G. Yu, J. M. Pierson, HaoLap: A Hadoop based OLAP system for big data, Journal of Systems and Software 102 (2015) 167–181, https://doi.org/10.1016/j.jss.2014.09.024.
DOI: https://doi.org/10.1016/j.jss.2014.09.024   Google Scholar

R. Stanek, Microsoft SQL Server. Brno: Computer Press, 2013. [12.02.2023]
  Google Scholar

R. Kumar, B. B. Parashar, S. Gupta, Y. Sharma, N. Gupta, Apache hadoop, nosql and newsql solutions of big data, International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) 1(6) (2014) 28–36, https://doi.org/10.13140/2.1.3454.9444.
  Google Scholar

K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system, Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), Ieee, Incline Village, NV, USA (2010) 1–10, https://doi.org/10.1109/MSST.2010.5496972
DOI: https://doi.org/10.1109/MSST.2010.5496972   Google Scholar

J. Dittrich, J. A. Quiané-Ruiz, Efficient big data processing in Hadoop MapReduce, Proceedings of the VLDB Endowment 5(12) (2012) 2014-2015, https://doi.org/10.14778/2367502.2367562.
DOI: https://doi.org/10.14778/2367502.2367562   Google Scholar

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, E. Baldeschwieler, Apache hadoop yarn: Yet another resource negotiator., Proceedings of the 4th annual Symposium on Cloud Computing, Santa Clara California (2013) 1–16, https://doi.org/10.1145/2523616.2523633
DOI: https://doi.org/10.1145/2523616.2523633   Google Scholar

Download


Published
2023-09-30

Cited by

Zadrąg, M. (2023). A comparative analysis of the performance of the relational database and the Hadoop environment in the context of analytical data processing. Journal of Computer Sciences Institute, 28, 210–216. https://doi.org/10.35784/jcsi.3675

Authors

Michał Zadrąg 
michal.zadrag@pollub.edu.pl
Lublin University of Technology Poland

Statistics

Abstract views: 42
PDF downloads: 95