Shedding light on Dark Data
We are experiencing an explosion of available data: pretty much everything “new” is instantly at our fingertips through the search engines of the Internet within days or sometimes even minutes after it has been made available. Yet, this information is only the tip of the iceberg: relevant, often business-critical data is hiding in plain sight, just out of our reach. Tapping into this dark data is a growing trend.
The cost of adding connectivity to formerly “dumb” devices has plummeted, resulting in a major expansion of “intelligent” devices in our everyday environment. This explosion of Internet of Things has opened the floodgates for massive parallel data streams, out of which we only tend to tap into a small fraction of.
Similarly, many legacy systems were never built with a holistic data collection in mind:
Take an existing factory data gathering network as an example: the original requirement of such a system was only to automate a control loop by the application of set limit values for operations and alerts. The momentary value of an individual sensor in such a control loop was of no importance. Therefore, a trend of a slowly drifting value that still remains under the system’s radar goes undetected until it passes a preset critical value. This may lead to a disruptive production halt due to the unscheduled maintenance.
Now let’s assume that we can tap into this control loop. To do this without causing any additional traffic load inside the loop itself, we can set a switch to mirror all traffic into another network segment, and put a special program that captures all packets, does some intelligent edge processing for the data, and sends the results in a logically condensed format to a data lake for further processing.
Business Intelligence software can thereafter pick up this new data stream consisting of more fine-grained, expanded time series values, which can be made to show long-term trends in an intuitive graphical format. The trend may then reveal the need for preventive maintenance, which can be performed during a scheduled down-time, hence not affecting the output of the factory and not requiring costly overtime or one-day shipping of spare parts.
Similar cases of hidden dark data can be found in almost any legacy systems: according to various studies, the companies only utilize about 5-10% of the data they generate. Transient information is not tapped into, and this is where Datumize can step in: our Datumize Data Collector offers the kind of Edge Processing technology that is required to handle high-volume streams, allowing organizations to tap into their existing information streams with ease.
The example above was just one simple real-life use case: identically, systems that deal with customer input, like searches for available flights or spare parts often only come out with an on-off result: a search is either made or not. What happens during the process, for example with search targets that are not found or completed fully by the customer may never be recorded.
By using the DDC technology, a much more fine-grained data from customer interactions is possible, opening totally new insights to your business.