The Internet is a global, complex network that connects objects and people. This decentralized network, which is hardly quantifiable in its size, is growing and changing rapidly. Big Data is a kind of documentation of the transactions and activities that take place in this network. But what does Big Data mean to us – and what challenges does it pose?
Today’s data volume – better known as Big Data – is not only extremely large and rapidly growing, but also very diverse due to different data sources: from the provision of diverse content, networking of people and things to spatial specifications. These data sources can all be traced back to the Internet, or more precisely, the Internet of Events. The term comes from the fact that every activity within the Internet is automatically logged as a so-called event.
The 4 V’s from Big Data
Another peculiarity of the event data is that there is a certain uncertainty in their truth content. For example, it is not always possible to use the data to determine whether the owner of a smartphone is actually the user. Or whether the profile of a real person in a social network is also managed by this person.
This results in four characteristic properties – the 4 V’s – of Big Data:
“Volume” – a large volume of data
“Velocity” – high speed in the changes and growth of data
“Variety” – many different data sources
“Veracity” – the uncertainty of the accuracy or the truth content of the data.
The Big Data Challenge
So there are huge amounts of data on hard disks, computers or machines all over the world. These data are nothing more than detailed and accurate information, for example on economic, social or political activities. However, these data often remain completely unaffected.
Professor Wil van der Aalst, who is also considered the “inventor” of Process Mining, describes the problem in the age of Big Data as follows: “The challenge today is not to generate more data, but the challenge today is to turn this data into real value”. We have data like sand on the ocean – but what do we do with all this data? How do we transform the data into usable information?
It’s a long way from the collection of the data to its utilization. This path is also not always straightforward: Regulatory and legal regulations restrict data access and the scope for action. In addition, different data sources and documentation must be brought into a uniform or at least comprehensible form. After all, what do we gain from data that we do not understand?
Data Science – the solution?
If you take a closer look at the process, more and more questions arise. How do you get access to the relevant data? How can the correctness of the data be verified? How can the data be transformed into a language that is understandable and communicable for us – or at least for the experts among us? How can this data be analyzed, interpreted and used? And not to forget: Who is in a position to do this? Since this problem is so complex, completely new science has emerged: data science.
In the next article, we will look at how data science answers these questions and what challenges need to be overcome.