The comparative analysis of modern ETL tools


Abstract

Each data warehouse requires loading properly processed transactional data. The process that performs this task is known as extract-transform-load (ETL). The efficiency of its implementation affects how quickly the user will have the access to the current analytical data. The paper presents the results of research efficiency of ETL performance of its stage with the use of Azure Synapse (AS) and Azure Data Factory (ADF). The research included selection, sorting and aggregating data, joining tables, and loading data into target tables. To evaluate the efficiency of these operations, the criterion of their execution time has been used. The obtained results indicate that the ADF tool provides a much higher time efficiency of loading transactional data into the data warehouse comparing to AS.


Keywords

Azure Synapse, Azure Data Factory; ETL tools

Ł. Bielak, P. Muryjas, Integracja Big Data i Business Intelligence jako innowacyjne rozwiązanie wspomagające funkcjonowanie nowoczesnych organizacji, Journal of Computer Sciences Institute 1 (2016) 6–13.

А. С. Черняев, ETL: обзор инструментов, Молодой ученый, 1 (2019), 23–26, https://moluch.ru/archive/239/55368/, [16.04.2021].

Azure Data Factory documentation, https://docs.microsoft.com/en-us/azure/data-factory/ , [16.04.2021].

R. Sudhir, A. Narain, Understanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions, Apress, Berkeley, 2019.

A. Leonard, K. Bradshaw, SQL Server Data Automation Through Frameworks. Building Metadata-Driven Frameworks with T-SQL, SSIS, and Azure Data Factory, Apress, Berkeley, 2020.

Dokumentacja narzędzia Azure Synapse Analytics, https://azure.microsoft.com/pl-pl/services/synapse-analytics/, [16.04.2021].

Architektura dedykowanej puli SQL (dawniej SQL DW) w usłudze Azure Synapse Analytics, https://docs.microsoft.com/pl-pl/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture, [16.04.2021].

Wybór między modelami zakupów rdzeń wirtualny i DTU — Azure SQL Database i wystąpienie zarządzane SQL, https://docs.microsoft.com/pl-pl/azure/azure-sql/database/purchasing-models#dtu-based-purchasing-model, [16.04.2021].

Przewodnik dotyczący wydajności i dostrajania przepływu danych, https://docs.microsoft.com/pl-pl/azure/data-factory/concepts-data-flow-performance, [16.04.2021].

Monitorowanie przepływów danych, https://docs.microsoft.com/pl-pl/azure/data-factory/concepts-data-flow-monitoring, [16.04.2021]

Download

Published : 2021-06-30


Mayuk, V., Falchuk, I., & Muryjas, P. (2021). The comparative analysis of modern ETL tools. Journal of Computer Sciences Institute, 19, 126-131. https://doi.org/10.35784/jcsi.2631

Vitalii Mayuk 
Department of Computer Science, Lublin University of Technology  Poland
Ivan Falchuk 
Department of Computer Science, Lublin University of Technology  Poland
Piotr Muryjas  p.muryjas@pollub.pl
Lublin University of Technology