Searching for the term “types of data” can often leave you more confused than before. There doesn’t seem to be a clear list of types of data. Some authors say there are 3 types, some even say 13. What makes it confusing is that there are many types and kinds of data, and they often have something in common that’s overlapping, or different words are used to describe the same type. This makes it difficult to distinguish one from another.
The reality is that there are many different types of data, with various purposes for different business goals. Here, the most important types of data for your business will be explained. Such as big data, dark data, machine data and more.
Big Data is a term heard often in the last couple of years. It is data characterized by its volume, variety, and velocity. This means that it's so voluminous and complex that it can’t be analyzed with the traditional business intelligence tools. In the last decade, companies were all about learning to deal with Big Data and using it to make better-informed business decisions.
Big data comes from everywhere and it’s nothing new. However, it stays new because the definition and structure of big data are constantly changing. This happens due to the rapid and constant growth of data. For example, big data used to be expressed in terabytes, while now it’s expressed exabytes.
In contrast to Big Data, Smart data is actionable and does make sense and has a clear purpose. It's not about the volume of the data you are collecting - it's about the actions you take in response to that data. It's a concept that developed along with the development of algorithm-based technologies such as artificial intelligence and machine learning.
Smart Data is usually generated close to the data source, using edge computing technologies. So instead of collecting all data from a source, we process the data in the source to end dumping in our data lake only the valuable data; the Smart Data.
Dark Data is the data that lies below the surface, hiding within the company's internal networks and holding piles of relevant information that can be moved to the data lake and generate vital business and operational insights. This type of data forms the biggest part of all existing data. According to KPMG research, 80% of data is dark data. Which means there are a lot of hidden opportunities lying in the hollows of dark data.
The many sources that were being ignored or hard to access, can now be treated as data gold mines. This is possible due to cutting edge technology that can reach into the places that hold dark data.
Simply put, machine data is the data created by the systems, technologies, and infrastructure powering modern businesses. It comes from places like control and operational systems, sensors and IoT, and your industrial network.
If made accessible and usable, machine data is argued to be able to help organizations troubleshoot problems, identify threats and use machine learning to help predict future issues.
Even if you are a machine manufacturer or you have assets and devices in your business hiding relevant operational information, being able to gather device data for observability and analysis of usage and performance is a must. With machine data, you can feed any compelling cloud analytics and advanced services systems.
This is the most basic form of data, and easiest to understand. This is data describing an event or business activity. It describes orders, invoices, production activities, hiring and firing employees, etc.
Master data are the key elements for transactional data. Master data describes places, parties, and things (products, items) involved in the activity. As explained by Rendy Dalimunthe once,
“The trip data in a cab company may contain driver, passenger, route, and fare data. The driver, passenger, locations, and basic fare data are the master data. The driver data may consist of the name of the driver and all of the associated information. So does the passenger data. Together, they make up the transactional data.”
Reference data is a subset of master data. It is usually standardized data that is governed by certain codification. It's not so difficult to understand reference data. It is data that defines values to be used by other data fields. These values are often consistent and do not change much over time. Examples of reference data are units of measurements, country codes, corporate codes.
Let's use Dalimunthe’s cab company example again to highlight the difference between master data and reference data;
“Tomorrow, the day after tomorrow, or next week, the list of drivers may change whenever there’s a new person on board or kicked out.” This changes the master data, but as Dalimunthe also guarantees, it is unlikely that the list of existing countries will change in the next decades.
It’s an aggregated data compile for the purpose of analytics and reporting. This data consists of transactional, master, and reference data. For example trip data (transaction + master) on the 13th day of July in the Greater London region (reference). Reporting data is very strategic and usually being produced as an ingredient of the decision-making process.
Metadata is the term used to refer to data that describes other data. In a more concrete explanation, metadata are the data that describe the structure of and some meaning about other data. It explains definitions of data that you might not see at first sight, such as usage of data, creators of data, users of data, relations of data. You could see it as a little book that contains all you need to know about your data. Simply put, metadata is data about data.
As you’ve learned by now, there are many different types of data. Each type of data has relations to another type, and each type has a different, beneficial, purpose. Understanding the benefit and purpose of each type will help your business with data collection, analytics, and business decisions.