Improving Data Quality with Process Mining

Improving Data Quality with Process Mining
May 13, 2019 Franzi

How can you improve data quality through process mining? The data is the basis of a process analysis, not the result – right? At the beginning of a process mining project, a company selects a specific process for analysis. This process is then enriched with data from various IT systems to enable a process mining analysis. The data is identified, extracted and transformed. But what effects does this transformation have on the data itself? Is it only important to enable process mining analyses, or does the processing of the data have other advantages? The answer – Yes, more than you expect.

Datenqualität Daten verbessern improve data quality with lana process mining

Understand the relevance of data

After you have selected the specific process for your analysis, the identification of the necessary information and data begins. It is important that these have a connection to the particular process and provide answers to your analytical questions. Data that is not conducive to the purpose of your analysis should be ignored in order not to unnecessarily increase the complexity of data extraction and transformation. Example of a processing time analysis for customer complaints: customer data, orders, order status and complaint handling time are important. On the other hand, the customer’s place of residence is irrelevant.

It is therefore important to understand which data is of interest for which processes and thus for their analysis. From this, it becomes clear that data is not a waste product from the use of IT systems, but a very decisive resource with the potential for immense information gain.

Transform strategically. Implement smartly. Optimize sustainably.

With LANA Process Mining you have full control over your digitization strategy. You can find out how LANA makes your company faster, more efficient and smarter by talking to our experts.

Discover and use interrelationships

You now know which data belongs to the selected process or is relevant for it. Especially when data is stored in different IT systems, this is often not transparent. However, because all this data belongs to one and the same process, connections between the data can now be more easily identified or new relationships can even be derived. These relations often reveal the true meaning of the data. These insights help you to interpret the data, to analyze it and to understand the results.

You can also detect the absence of process-relevant data in this way. Missing data is often caused by digital business processes that are poorly or not at all represented in the system. It is less common for the system to be configured incorrectly, which leads to data not being stored or being stored in another form. Use these findings to optimize your system processes.

Overcome system gaps

During the selection of the relevant data, it also becomes clear which IT systems are involved in the selected process and where the required data is located. It is important to know where this data is located in order to extract it. Depending on the data source, data is available in the form of databases, files or even handwritten. For example, customer data is stored in the CRM system, whereas data for the delivery of an order is stored in the ERP system. LANA Process Mining makes it possible to use data from various sources in an analysis and thus easily overcome system gaps. LANA is a bridge in your system landscape.

Discover the data structure

Once you have identified all the relevant data and systems, you are ready to extract the data. During or immediately after the data extraction, you can see which structure the data has. You can see whether it is data with a fixed structure from a database, semi-structured data such as e-mails, or rather unstructured data without a fixed structure where only the data type is known (e.g. images, audio or video files).

Also within the data the question arises again: Which information is relevant? This time the question is about the structure. This includes, for example, the exclusion of certain columns in structured data or certain parameters in semi- or unstructured data, because this information is irrelevant for the planned analysis. In our example, one such column is the place of residence of the customer who is complaining. Knowing how these abstract data sets are actually structured not only provides you with important information for process mining analysis, but also a deeper understanding of your virtual IT structures.

With cleansed and standardized data into the analysis

To be able to use the extracted data for the analysis, it must be prepared in the first place. This preparation usually means cleaning the data, for example from duplicate data records, and standardizing the data format so that data from different sources can be used together. With the removal of duplications and false values the data quality rises noticeably.

Depending on which decisions you made in the previous step, not only individual values, but entire columns, such as the place of residence, are sometimes removed from the data. You should also clarify how to deal with empty values in the data. Basically, it is advisable to convert empty values to null values, because many tools can work better with the data this way. You should retain plausible or justified null values in order not to falsify the analyses.

Finally, the data is transformed into a standardized format. If the data already exists in a standardized format and corresponds to the target format that the analysis tool can handle, this step is not necessary. With the standardization and cleaning of the data quality increases significantly. For process mining analyses with LANA, the data should have the following structure:

Data quality Data structure process mining start ende activity aktivität

Better data quality leads to better insights

As you can see, process mining projects always improve data quality. The data is cleared of redundancies or errors and transformed into a consistent format. In addition, the understanding of the data and its interrelationships increases. This not only improves the quality of the data itself, but also your way of working with the data and the corresponding analyses. Better data enables you to make better, more well-founded decisions that sustainably increase your company’s success.

Take the first step towards better data quality! Talk to our experts about how you can get started with LANA Process Mining.