The comparative analysis of modern ETL tools

Vitalii Mayuk


Department of Computer Science, Lublin University of Technology (Poland)

Ivan Falchuk


Department of Computer Science, Lublin University of Technology (Poland)

Piotr Muryjas

p.muryjas@pollub.pl
Lublin University of Technology

Abstract

Each data warehouse requires loading properly processed transactional data. The process that performs this task is known as extract-transform-load (ETL). The efficiency of its implementation affects how quickly the user will have the access to the current analytical data. The paper presents the results of research efficiency of ETL performance of its stage with the use of Azure Synapse (AS) and Azure Data Factory (ADF). The research included selection, sorting and aggregating data, joining tables, and loading data into target tables. To evaluate the efficiency of these operations, the criterion of their execution time has been used. The obtained results indicate that the ADF tool provides a much higher time efficiency of loading transactional data into the data warehouse comparing to AS.


Keywords:

Azure Synapse, Azure Data Factory; ETL tools

Ł. Bielak, P. Muryjas, Integracja Big Data i Business Intelligence jako innowacyjne rozwiązanie wspomagające funkcjonowanie nowoczesnych organizacji, Journal of Computer Sciences Institute 1 (2016) 6–13.
DOI: https://doi.org/10.35784/jcsi.60   Google Scholar

А. С. Черняев, ETL: обзор инструментов, Молодой ученый, 1 (2019), 23–26, https://moluch.ru/archive/239/55368/, [16.04.2021].
  Google Scholar

Azure Data Factory documentation, https://docs.microsoft.com/en-us/azure/data-factory/ , [16.04.2021].
  Google Scholar

R. Sudhir, A. Narain, Understanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions, Apress, Berkeley, 2019.
  Google Scholar

A. Leonard, K. Bradshaw, SQL Server Data Automation Through Frameworks. Building Metadata-Driven Frameworks with T-SQL, SSIS, and Azure Data Factory, Apress, Berkeley, 2020.
DOI: https://doi.org/10.1007/978-1-4842-6213-9   Google Scholar

Dokumentacja narzędzia Azure Synapse Analytics, https://azure.microsoft.com/pl-pl/services/synapse-analytics/, [16.04.2021].
  Google Scholar

Architektura dedykowanej puli SQL (dawniej SQL DW) w usłudze Azure Synapse Analytics, https://docs.microsoft.com/pl-pl/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture, [16.04.2021].
  Google Scholar

Wybór między modelami zakupów rdzeń wirtualny i DTU — Azure SQL Database i wystąpienie zarządzane SQL, https://docs.microsoft.com/pl-pl/azure/azure-sql/database/purchasing-models#dtu-based-purchasing-model, [16.04.2021].
  Google Scholar

Przewodnik dotyczący wydajności i dostrajania przepływu danych, https://docs.microsoft.com/pl-pl/azure/data-factory/concepts-data-flow-performance, [16.04.2021].
  Google Scholar

Monitorowanie przepływów danych, https://docs.microsoft.com/pl-pl/azure/data-factory/concepts-data-flow-monitoring, [16.04.2021]
  Google Scholar

Download


Published
2021-06-30

Cited by

Mayuk, V., Falchuk, I., & Muryjas, P. (2021). The comparative analysis of modern ETL tools. Journal of Computer Sciences Institute, 19, 126–131. https://doi.org/10.35784/jcsi.2631

Authors

Vitalii Mayuk 

Department of Computer Science, Lublin University of Technology Poland

Authors

Ivan Falchuk 

Department of Computer Science, Lublin University of Technology Poland

Authors

Piotr Muryjas 
p.muryjas@pollub.pl
Lublin University of Technology

Statistics

Abstract views: 846
PDF downloads: 572