Data Transformation and Case IDs

Data Transformation and Case IDs
November 21, 2019 Franzi

The way to well-transformed process data

The ETL process – extracting, transforming and loading data – is an essential part of any process mining analysis. Apart from the fact that no analysis would be possible without log files, not every data transformation is a good data transformation. The more conscientiously the ETL process is performed, the more effective the process analysis will ultimately be, because clean data ensures maximum transparency.

However, this technical preparatory step, the transformation of data, can often be intimidating due to its apparent complexity. What does it actually mean to transform data “well”?

The data transformation has many different parameters that you should pay attention to. The core, is always the Case ID. It is by far the most important parameter in the entire ETL process, because without it no process analysis can be performed. Precise case IDs are the most important basis for well-transformed process data.

Process_Mining_Case_ID_Data

What is a Case ID?

The Case ID is the unique identifier of a case within a process. (Read the complete definition in the Process Mining Glossary.) This means that each case that passes through the process is assigned an ID number by the system. A “case” can be a transaction in retail, an invoice in purchasing, a product to be manufactured in assembly, or an application in recruiting.

This, however, also results in the fact that a Case ID is not always clearly recognizable. In one process the Case ID can be a customer number, in the other a product ID. Here it is an order number, there an employee ID. Identifying the correct Case ID is therefore the first step.

What should you pay attention to when dealing with Case IDs?

A case ID must always be available for successful process mining. Together with the start time and the activity name, the Case ID thus forms one of the basic components of every log file. In certain contexts, however, it can happen that a Case ID is not unique. The same ID may appear for several different cases in the system, or a single case may have several case IDs at the same time – or none at all. Fortunately, there are several solutions for these special cases.

Transform strategically. Implement smartly. Optimize sustainably.

With LANA Process Mining you have full control over your digitization strategy. You can find out how LANA makes your company faster, more efficient and smarter by talking to our experts.

 

Identical Case IDs between multiple cases

A classic example for this case is the sales process. Both customer numbers and product IDs would be good candidates for the Case ID. But what if a single customer buys different products one after the other? Or if a product is purchased by different buyers? In both cases, the same ID suddenly appears in different process flows.

The solution: Combine the Case ID with another key figure to create a new, smaller ID. If you use the customer number as a Case ID, combine it with the product number to create a unique identifier for every possible case.

A single case with separate case IDs

A purchase order can trigger multiple deliveries, for example, if not all items are currently available or are delivered from another location. There is a risk of a break in the process, making analysis difficult or even impossible.

For this reason, delivery IDs should be created for the individual deliveries and assigned to the ID of the purchase order. The case ID, in this case the order ID, refers to the sub-IDs, in this case the delivery IDs. By this reference, the connection between the purchase order and the deliveries remains and the entire processes can be analyzed. Conversely, a delivery can contain several purchase orders.

A case without any Case ID

Whether due to a system error or incomplete process documentation, under certain circumstances it can happen that individual cases or even entire log files have no usable case IDs. As mentioned before, the existence of Case IDs is a prerequisite for process analysis. Such situations must therefore be resolved immediately.

If no obvious case IDs result from the framework conditions of the process, the only possibility is to generate new IDs independently. You must then assign these key figures to the corresponding process steps in order to complete your log file.

Thus, it becomes apparent that the inconspicuous sequences of numbers conceal extremely relevant building blocks for effective process analysis. Handling Case IDs is not always easy, but the effort to keep your log files clean and uniform is worth it.

Do you want to know how to get your data into the right process mining format? Then contact us!