ETL (Extract, Transform, Load)
“ETL” – Extract, Transform, Load – describes a process in which data is extracted from one system, transformed and loaded into another system. In the context of Process Mining, data is first extracted, then transformed, and then loaded into a Process Mining tool.
How does the ETL process work?
In data extraction, a section of the data that is of interest is provided from different data sources. These extracts can have different schemes, sizes, and granularities. For this reason, data extraction serves as the basis for the next step – data transformation.
In data transformation, the result of the data extraction is further used and modified. The goal here is a uniform data schema. Data transformation is part of data preprocessing. The modification can be both syntactic and semantic.
In syntactic data transformation, formal aspects are adapted. These can be, for example, date formats that are converted by the system for the purpose of unification or better processing. This does not change the meaning of the data. This also includes reformulating cryptic names to make them easier to read.
In semantic data transformation the data is enriched with meaningful information, aggregations are made or converted into units. This also makes the data sets more meaningful.
The result of the data transformation is a modified data set that is loaded into the end system.
Why is ETL so important?
In the last step, the data is loaded into a Process Mining tool. The phrase “garbage in – garbage out” plays an important role here, because an incorrect or incomplete data set is very likely to produce an incorrect or incomplete result in the analysis. Once the data has been successfully loaded into the tool, Process Mining can be started.