Analyze the effectiveness of ETL processes implemented using SQL and  Apache HiveQL languages

Krzysztof Litka

doi:10.35784/jcsi.3674

PDF

Published: Sep 30, 2023

DOI: https://doi.org/10.35784/jcsi.3674

Issue Vol. 28 (2023)

Articles

The Examination of SQL Queries Efficiency in Chosen IT System
Krzysztof Barczak

186-189
Comparative analysis of selected databases on the example of a proprietary web application
Łukasz Przychodzień, Dominika Radwan, Grzegorz Kozieł

190-196
Performance optimization of web applications using Qwik
Adam Lipiński, Beata Pańczyk

197-203
Analyze the effectiveness of ETL processes implemented using SQL and Apache HiveQL languages
Krzysztof Litka

204-209
A comparative analysis of the performance of the relational database and the Hadoop environment in the context of analytical data processing
Michał Zadrąg

210-216
Performance comparison of Flutter platform GUI in web and native environments
Juliusz Piskor, Marcin Badurowicz

217-222
Usability analysis of banking service interfaces in Poland
Paulina Sułek, Aleksandra Walaszek

223-228
Comparative analysis of selected tools for test automation of web applications
Analiza porównawcza wybranych narzędzi do automatyzacji testów aplikacji webowych
Michał Pojęta, Franciszek Wąsik, Małgorzata Plechawska-Wójcik

229-235
Comparative analysis of methods for testing web applications
Wojciech Superson, Tomasz Smyk, Małgorzata Plechawska-Wójcik

236-241
Performance comparison of microservices written using reactive and imperative approaches
Kacper Mochniej, Marcin Badurowicz

242-247
Comparative analysis of live sports streaming services
Emilia Skiba

248-255
Comparative analysis of Angular and React development frameworks
Sylwester Skrzypiec, Małgorzata Plechawska-Wójcik

256-263
Performance analysis of databases created in virtualized and containerized environment
Zygmunt Łata, Maria Skublewska-Paszkowska

264-272
A comparative analysis of non-relational databases in e-commerce applications
Kacper Saweczko, Grzegorz Rożek, Małgorzata Plechawska-Wójcik

273-278
Analysis of how universal design principles impact on the perception of virtual museum interfaces
Dawid Nicpoń, Weronika Wach, Maria Skublewska-Paszkowska

279-284
An accessibility analysis of websites of selected types of universities
Maciej Banaszak, Mariusz Dzieńkowski

285-290
Impact of changes in graphics setting on performance in selected video games
Kamil Szafran, Małgorzata Plechawska-Wójcik

291-295
The comparative performance analysis of selected relational database systems
Szymon Schab

296-303

Authors

Krzysztof Litka

s99174@pollub.edu.pl

Lublin University of Technology, Poland

Abstract

In the era of digitization, where data is collected in ever-increasing quantities, efficient processing is required. The article analyzes the performance of SQL and HiveQL, for scenarios of varying complexity, focusing on the execution time of individual queries. The tools used in the study are also discussed. The results of the study for each language are summarized and compared, highlighting their strengths and weaknesses, as well as identifying their possible areas of application.

Keywords:

ETL, SQL, HiveQL

References

E. Capriolo, D. Wampler, J. Rutherglen, Programming Hive: Data Warehouse and Query Language for Hadoop, O'Reilly Media, 1st edition, 2012.

J. Caserta, R. Kimball, The Data Warehouse ETL Toolkit., Wiley, 2004.

Cloudera Data Platform, https://www.cloudera.com/products/cloudera-data-platform.html, [25.05.2023].

J. Dean, S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM 51(1) (2008) 107-113, https://doi.org/10.1145/1327452.1327492. DOI: https://doi.org/10.1145/1327452.1327492

B. Karwin, SQL Antipatterns: Avoiding the Pitfalls of Database Programming, Pragmatic Programmers LLC, The 1st edition 2017.

P. Mellor, SQL and Relational Theory: How to Write Accurate SQL Code, O'Reilly Media Inc., 2011.

B. Oliveira, O. Belo, J. Caldeira, A Systematic Literature Review on Big Data Extraction, Transformation and Loading (ETL), Proceedings of the 2021 Computing Conference Volume 2 held virtually (2021) 308-324, https://doi.org/10.1007/978-3-030-80126-7_24. DOI: https://doi.org/10.1007/978-3-030-80126-7_24

A. Pelikant, Hurtownie danych. Od przetwarzania anali-tycznego do raportowania, Wydanie II, Helion, 2021.

A. Simitsis, P. Vassiliadis, T. Sellis, Optimizing ETL processes in data warehouses, 21st International Confer-ence on Data Engineering (ICDE'05), Tokyo, Japan (2005) 564-575, https://doi.org/10.1109/ICDE.2005.103. DOI: https://doi.org/10.1109/ICDE.2005.103

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, Hive - a Petabyte Scale Data Warehouse using Hadoop, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA USA (2010) 996-1005, https://doi.org/10.1109/ICDE.2010.5447738. DOI: https://doi.org/10.1109/ICDE.2010.5447738

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R. Murthy, Hive: a ware-housing solution over a map-reduce framework, Proceed-ings of the VLDB Endowment 2(2) (2009) 1626–1629, https://doi.org/10.14778/1687553.1687609. DOI: https://doi.org/10.14778/1687553.1687609

T. White, Hadoop: The definitive guide, O'Reilly Media Inc., 2012.

P. C. Zikopoulos, C. Eaton, Understanding big data: Analytics for enterprise class Hadoop and streaming data, McGraw-Hill Osborne Media, 2011.

N. Ahmed, S. Ahamed, J. I. Rahim, Data Processing in Hive vs. SQL Server: A comparative analysis in the query performance, 2017 IEEE 3rd International Conference on Engineering Technologies and Social Sciences, Bangkok, Thailand (2017) 1-5, https://doi.org/10.1109/icetss.2017.8324202. DOI: https://doi.org/10.1109/ICETSS.2017.8324202

Litka, K. (2023). Analyze the effectiveness of ETL processes implemented using SQL and Apache HiveQL languages. Journal of Computer Sciences Institute, 28, 204–209. https://doi.org/10.35784/jcsi.3674

Analyze the effectiveness of ETL processes implemented using SQL and Apache HiveQL languages

Issue Vol. 28 (2023)

Archives

Authors

Abstract

Keywords:

References

License

Article Sidebar

Issue Vol. 28 (2023)

Archives

Main Article Content

Authors

Abstract

Keywords:

References

Article Details

License