Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
Article Sidebar
Open full text
Issue Vol. 30 (2024)
-
Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
Mikołaj Skrzypczyński, Piotr Muryjas1-8
-
Analysis of the application for the DFD authoring usage possibilities
Marek Pieczykolan, Marcin Badurowicz9-13
-
Comparative analysis of query execution speed using Entity Framework for selected database engines
Krzysztof Winiarczyk, Rafał Stęgierski14-20
-
C++ and Kotlin performance on Android – a comparative analysis
Grzegorz Zaręba, Maciej Zarębski, Jakub Smołka21-25
-
Comparative analysis of Node.js frameworks
Bartłomiej Zima, Marcin Barszcz26-30
-
User experience analysis in virtual museums
Aleksandra Kobylska, Mariusz Dzieńkowski31-38
-
Analysis of user experience during interaction with automotive repair workshop websites
Radosław Danielkiewicz, Mariusz Dzieńkowski39-46
-
A comparative analysis of transitions generated using the Unity game development platform
Marek Tabiszewski47-52
-
Comparative analysis of the performance of Unity and Unreal Engine game engines in 3D games
Kamil Abramowicz, Przemysław Borczuk53-60
-
Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
Irwan Budiman, Mohammad Reza Faisal, Astina Faridhah, Andi Farmadi, Muhammad Itqan Mazdadi, Triando Hamonangan Saragih, Friska Abadi61-67
Main Article Content
DOI
Authors
mikolaj.skrzypczynski@pollub.edu.pl
Abstract
The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of scripts and queries destined for Apache Hive and Apache Pig, and then executed 10 times on environment brought by created virtual machine. Those methods were performed on the same data sets for 16 times according to previously prepared research scenarios. As the conclusion, authors had observed that Apache Hive is more efficient tool, than Apache Pig.
Keywords:
References
K. Bansal, P. Chawla, P. Kurle, Analyzing Performance of Apache Pig and Apache Hive with Hadoop, International Conference On Engineering Vibration Communication and Information Processing (ICoEVCI), (2018) 41-51, https://doi.org/10.1007/978-981-13-1642-5_4 DOI: https://doi.org/10.1007/978-981-13-1642-5_4
M. Ahmad, S. Kanwal, M. Cheema, M. A. Habib, Performance Analysis of ECG Big Data using Apache Hive and Apache Pig, 2019 8th International Conference on Information and Communication Technologies (ICICT), (2019) 2-7, https://doi.org/10.1109/ICICT47744.2019.9001287 DOI: https://doi.org/10.1109/ICICT47744.2019.9001287
A. Fuad, A. Erwin, H. P. Ipung, Processing performance on Apache Pig, Apache Hive and MySQL cluster, Proceedings of International Conference on Information, Communication Technology and System (ICTS), (2014) 297-302, https://doi.org/10.1109/ICTS.2014.7010600 DOI: https://doi.org/10.1109/ICTS.2014.7010600
Dokumentacja techniczna technologii Apache Hadoop https://hadoop.apache.org/, [10.07.2023]
K. Sitto, M. Presser, Field Guide to Hadoop: An Introduction to Hadoop, Its Ecosystem, and Aligned Technologies, O'Reilly Media, 2015
Dokumentacja techniczna technologii MapReduce https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Overview, [10.07.2023]
D Dayong., Apache Hive Essentials Second Edition, Packt Publishing, 2015
C. Swarna, Z. Ansari, Apache Pig-a data flow framework based on Hadoop Map Reduce. International Journal of Engineering Trends and Technology (IJETT), 50 (5) (2017) 271-275 https://doi.org/10.14445/22315381/IJETT-V50P244 DOI: https://doi.org/10.14445/22315381/IJETT-V50P244
Środowisko wirtualizacji VMware Workstation 17 Player https://www.vmware.com/products/workstation-player/workstation-player-evaluation.html, [10.07.2023]
Komponenty składowe środowiska Cloudera CDH https://www.cloudera.com/products/open-source/apache-hadoop/key-cdh-components.html, [10.07.2023]
Zbiór danych testowych „NYC Taxi Trips Dataset” https://maven datasets.s3.amazonaws.com/Taxi+Trips/NYC_Taxi_Trips.zip, [10.07.2023]
Article Details
Abstract views: 450
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
