DATABERG
Empowering Dark Data: What's behind the hype?
Introduction
Today’s business leaders have a myriad of technologies at their disposal to lever their innovation initiatives. However, business leaders are also aware of the shortcomings in their current data lakes. These being the weaknesses of the technologies being used to collect data, therefore, there are highly valuable data sets that are not being leveraged at all. With new software technologies, these data can be captured to provide valuable insights.
What is Dark Data?
You have probably heard of the term Big Data but, more recently, Dark Data has begun to be on the radar. This is due to the great opportunity that leveraging Dark Data hides in terms of business growth and competitive advantage.
Dark Data refers to the information assets that companies generate during its regular activity but are not used in any manner to derive insights or for decisions making. According to KPMG, 80% of data is Dark Data, and thus is currently not used.
By using Dark Data solutions, companies get more abundant and better insights. Such as, better understanding their market demand, or employee and vehicle movements in a warehouse, which will help them improve their operational performance and productivity.
Types of data
Stored and used
These data are transactional data that we can find in our databases, and needed and used for daily operations. These data can be organized and processed easily. They can be both human or machine-generated, as long as it is created within a relational database structure. These data represents 1 to 5% of data generated by a company.
Stored but not used
These data are transactional gross data, usually part of the daily operations but spread out in various departments or systems. For many businesses it is still a barrier to transform these unstructured data into structured insights. Think of examples such as CRM and ERPs data, emails, documents, and application logs. This data represents 5% to 10% of data generated by companies.
Not collected, not stored, and not used
It is estimated that 80% of data generated by companies is not collected or stored. This is what we know as Dark Data. This can be data flowing in network transactions, data generated by WiFi technologies, or data produced by IoT sensors and industrial machines.
Where can we find these dark data?
Network Transactions
Networking is the technology behind the Internet that has reshaped the information exchange in modern society. Tons of in-transit data remain hidden and unleveraged due to the difficulty of collecting and processing temporary transactions that flow over the network. By using network sniffing and deep packet inspection (DPI) the network transit data can be collected and transformed into valuable insights.
Distributed Locations
Companies are not always monolithic when considering information systems. Often, businesses such as retail or manufacturing have various distributed locations, for shops, factories, and warehousing. These locations also have information systems, which are often heterogeneous. The interoperability between these satellites and and central systems has always been a problem.
Unconsolidated IT landscapes with unconsolidated distributed locations running legacy systems and/or heterogenous technologies are the main reason why companies still have data silos. Data integration technologies such as file transfer or ETL for database synchronization represent the technologies to consolidate distributed data.
Industrial Networks and Devices
Operational Technology (OT) and Information Technology (IT) have evolved in parallel with limited touch points. OT is the hardware and software technologies that have supported the control, automation and monitorization of devices and machines. However, in reality, IT has now outpaced OT in terms of cost, flexibility, available manpower, etc.
The fact is that hundreds of different and incompatible industrial automation protocols are now being used. This leads to the result that highly valuable operational data is trapped inside machines, devices and sensors. There are many technologies for you to extract data from industrial machines and devices. A few examples are OPC, SCADA, network sniffing, and proprietary monitoring protocols.
Wi-Fi technologies
As you know, nowadays, Wi-Fi is present in almost every facility, public and present. Wi-Fi is a family of radio technologies, commonly used for wireless networking of devices. However, Wi-Fi has a power way beyond the pure wireless networking connectivity. Each Wi-Fi Access Point is already generating a huge variety of data that remains poorly explored.
These Dark Data can be used to deliver mobility intelligence to companies. Think of locating devices and understanding how those devices are moving around.
How to leverage Dark Data?
There are three basic steps that need to be covered to leverage Dark Data and make them usable for further usage.
Capture step
The capture step is about grabbing your hands on the interesting data sources. Network sniffing offers an opportunity to gather these data without interfering with the related systems.
Process step
Process step consists of cooking the raw captured data to make them usable afterwards. Data you collect can likely be massive and will require a summarization. In other cases, the data will require some enrichment, such as translating codes into understandable labels.
Edge computing technology helps to streamline the processing of Dark Data, as it helps to transform these large data sets in pre-processed data close to the source, with near real-time insights. This avoids unnecessary network bandwidth consumption and costly massive computer power in cloud servers, by sending only cleansed and purpose-oriented data.
Store step
The store step represents the closing of the ingestion. After you have tapped into the right data source and you have processed to yield high quality data, you need storage to make these data usable. The location, structure, durability, and technology are all aspects that need to be decided. At the end of this step, you have muted Dark Data into “standard” data.
Conclusion
The ocean of opportunities that this unexplored resource opens to business and operational improvement is appealing for companies all around the world. For industries such as Finance, Healthcare, Travel, Hospitality, Logistics, Retail, Manufacturing, and more, technology is one of their main pillars. To this day, they are letting go of 80% of their data.
As technology enables now to efficiently and affordably collect and manage these Dark Data, we are continually discovering new success stories.