DATABERG
Dark Data integration from unexplored sources with Datumize Data Collector
Data is the raw material for any innovation, intelligence, and analytics related project in companies: ensuring that we have the right data is the key to succeeding in these initiatives. In today's world, the volume and variety of data that companies are generating is enormous, and this makes it more complicated to pinpoint the essential data. It is estimated that more than 80% of data in companies is Dark Data: data that is not collected, and hence not leveraged at all.
Traditional data integration tools are helping companies to ingest and integrate transactional data from sources such as databases, APIs, files, and even legacy systems. But with this broader approach in regards to data collection, companies are forced to explore more complex and sophisticated sources in which they probably have valuable data that is being neglected — sources such as transient network traffic, distributed locations, industrial networks or Wi-Fi status information.
Leveraging Dark Data with Datumize Data Collector (DDC)
Datumize has developed a lightweight and robust software product that is capable of ingesting data from these more complex sources, even without causing any overhead. Datumize launched Datumize Data Collector (DDC) in 2015 to provide an agent that is easy to deploy and manage remotely, with the aim to help companies collect Dark Data. In its version 4.0, the Datumize Data Collector (DDC) includes a set of functionalities that makes it a powerful and highly competitive product to be used in any data ingestion project.
- Ingestion: As already mentioned, the primary differentiation for Datumize Data Collector (DDC) lies in the capacity to access sophisticated data sources, and especially because it allows performing deep packet inspection from network traffic (even industrial networks) with a non-intrusive technique known as network sniffing. Alternatively this same product can collect data from sensors, Wi-Fi networks, distributed databases, and a myriad of other sources.
Processing: One of the most relevant features included in the latest version of Datumize Data Collector (DDC) is Edge Computing: the same agent can be configured to enrich and cleanse data, dramatically cutting down the noise and volume of the data stream. Datumize Data Collector (DDC) is equipped to assemble data from network TCP/UDP packets, industrial protocols (Modbus, Fieldbus, OPC-DA), database interfaces, HTTP, and mobility data from wireless networks. DDC’s additional Cook Processor functionality allows enrichment and cleansing, and even adding business logic into the data processing step. The result is a clean, focused data stream that can be utilized for the benefit of the organization.
Sinks: Finally, Datumize Data Collector (DDC) sends the prepared data into any file, database, or Big Data platform in multiple formats, or connects to third-party systems via API. The software is particularly geared towards feeding Kafka streams.
Datumize Data Collector (DDC) is being used in several projects in leading companies, all of them addressed to leverage Dark Data into business, customer or operational intelligence. Thanks to our technology, our clients are discovering new and compelling insights to keep growing their business.