Data Extraction

Data extraction describes the extraction of data from a system. In the context of Process Mining, this means that event data is extracted from an IT system in order to then perform a data transformation and use this data for analyses.

What are the extraction methods?

There are different methods for extracting data, depending on which IT system it is or which data format is required. From some systems, such as SAP ERP 4.0, the data can be exported to any file format, for example as CSV files, at the push of a button. With other programs, it is necessary to address their API (Application Programming Interface). Here the data connection takes place on the level of the source code. An API can be for example a JDBC (Java Database Connectivity), a protocol or a web-based interface like REST. If a REST API is addressed, the output formats are often one or more JSON files or files in XML format. But the APIs of the programs often differ from each other in data structure, formats, objects, variables and remote calls, so that they have to be addressed specifically. Information about these differences can usually be found in the API documentation.

How does data extraction work?

How the data extraction takes place depends on the selection of the extraction method.
If you export data manually using a graphical user interface, you only need to select and export the required data, tables, and so on.

However, if the data is exported using the API of the program, the procedure is usually as follows:

1. Evaluation of the formats, data structure, objects, variables and remote calls of the API
2. Query the API for the required data
3. Save the response data in the desired format

All these steps are usually mapped with a query script or workflow. After data extraction, a data transformation is performed if required.


Related terms: ETL, Process Mining, Data Transformation