The comparative analysis of modern ETL tools
Vitalii Mayuk
Department of Computer Science, Lublin University of Technology (Poland)
Ivan Falchuk
Department of Computer Science, Lublin University of Technology (Poland)
Piotr Muryjas
p.muryjas@pollub.plLublin University of Technology
Abstract
Each data warehouse requires loading properly processed transactional data. The process that performs this task is known as extract-transform-load (ETL). The efficiency of its implementation affects how quickly the user will have the access to the current analytical data. The paper presents the results of research efficiency of ETL performance of its stage with the use of Azure Synapse (AS) and Azure Data Factory (ADF). The research included selection, sorting and aggregating data, joining tables, and loading data into target tables. To evaluate the efficiency of these operations, the criterion of their execution time has been used. The obtained results indicate that the ADF tool provides a much higher time efficiency of loading transactional data into the data warehouse comparing to AS.
Keywords:
Azure Synapse, Azure Data Factory; ETL toolsReferences
Ł. Bielak, P. Muryjas, Integracja Big Data i Business Intelligence jako innowacyjne rozwiązanie wspomagające funkcjonowanie nowoczesnych organizacji, Journal of Computer Sciences Institute 1 (2016) 6–13.
DOI: https://doi.org/10.35784/jcsi.60
Google Scholar
А. С. Черняев, ETL: обзор инструментов, Молодой ученый, 1 (2019), 23–26, https://moluch.ru/archive/239/55368/, [16.04.2021].
Google Scholar
Azure Data Factory documentation, https://docs.microsoft.com/en-us/azure/data-factory/ , [16.04.2021].
Google Scholar
R. Sudhir, A. Narain, Understanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions, Apress, Berkeley, 2019.
Google Scholar
A. Leonard, K. Bradshaw, SQL Server Data Automation Through Frameworks. Building Metadata-Driven Frameworks with T-SQL, SSIS, and Azure Data Factory, Apress, Berkeley, 2020.
DOI: https://doi.org/10.1007/978-1-4842-6213-9
Google Scholar
Dokumentacja narzędzia Azure Synapse Analytics, https://azure.microsoft.com/pl-pl/services/synapse-analytics/, [16.04.2021].
Google Scholar
Architektura dedykowanej puli SQL (dawniej SQL DW) w usłudze Azure Synapse Analytics, https://docs.microsoft.com/pl-pl/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture, [16.04.2021].
Google Scholar
Wybór między modelami zakupów rdzeń wirtualny i DTU — Azure SQL Database i wystąpienie zarządzane SQL, https://docs.microsoft.com/pl-pl/azure/azure-sql/database/purchasing-models#dtu-based-purchasing-model, [16.04.2021].
Google Scholar
Przewodnik dotyczący wydajności i dostrajania przepływu danych, https://docs.microsoft.com/pl-pl/azure/data-factory/concepts-data-flow-performance, [16.04.2021].
Google Scholar
Monitorowanie przepływów danych, https://docs.microsoft.com/pl-pl/azure/data-factory/concepts-data-flow-monitoring, [16.04.2021]
Google Scholar
Authors
Vitalii MayukDepartment of Computer Science, Lublin University of Technology Poland
Authors
Ivan FalchukDepartment of Computer Science, Lublin University of Technology Poland
Statistics
Abstract views: 846PDF downloads: 572
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.