The term Dark Data has not been around too long, giving thanks to it´s older brother Big Data. Big Data has far become the word everyone uses when they try to understand the complexity of just how much data they are producing and how they can start to make sense of it all. For any business, Data is vital, it holds the key to attracting new customers, increase growth and creating bigger profits, that’s why Big Data is Big Business.
However, within any organisations big data platform or internal network infrastructure there is Dark Data. This is the main reason why there has been a huge rise in the importance of “Dark Data” in recent years. Dark Data is not just the small portion of Big Data. It is the biggest slice of the pie, and holds massive amount of potential for those who want to harness its immense power.
Imagine a world without the internet, web pages, tagged or identifiable by keywords or search terms without Google or other search engines. This is what corporation information networks look like, with important corporate data held within various business applications, network folders and locations, all inaccessible to workers who could benefit from them.
How is Dark Data classified in a business?
Data can be classified into 3 areas for any organisation. First and foremost, there is critical business Data, this is the data that is needed and used to power the operation of the business, ensure that goals are met, and allows the business to grow year on year. Second there is your ROT (redundant obsolete and trivial) data. This data is not used and has no importance to the business or its need for data. Third and most important is your Dark Data, this is the data that is below the surface, hiding within your internal networks and can hold a huge amount of relevant information that can be moved to your Critical Data set.
What types of Data Could be dark?
According to a recent IBM study, over 80% of all data is dark and unstructured. IBM estimates that this will rise to 93% by 2020, giving the example that cars will be generating 350MB of data every second, all of which will need to go somewhere.
Dark data is different for each industry and individual company, but common examples include:
⦁ Spreadsheets (in one study, a business with 1,500 employees had 2.5 million spreadsheets, amounting to billions of cells of data)
⦁ Multiple old versions of documents
⦁ Email attachments and .zip files that are downloaded and then ignored
⦁ Inactive databases and unused customer information
⦁ Previous employee files and content (e.g. project notes)
⦁ Analytics reports and survey data
⦁ Log files, account information and transaction history
Ultimately, it’s data that’s left behind from processes, scattered across every level of a business. It’s disregarded and considered unnecessary by one department, but may be highly valuable to another.
How to harness your “Dark Data”
There are 3 key steps to getting the most from your Dark Data Challenges; Capture, Unlock and gaining results through BI. Capturing is arguably the most difficult part, knowing what to look for and where to look without modifying your systems or deploying an intrusive agent to capture Dark Data. Once the Dark Data has been identified, the process to unlock this dark data, pushing it to a big data platform that can then be used as part of a BI solution is fundamentally the secondary core function of the process. From here companies will be able to determine the value of their Dark Data
Why have companies until now been reluctant to harness their dark data?
For many businesses, at first glance, capturing their dark data can seem overwhelming, with no internal importance, resource or clear understanding. For this to change, businesses need to see the clear advantage of capturing their Dark Data and how it will change the way they operate their business. It is worth viewing Dark Data as another definition “Unfulfilled Value”. It is information that can provide huge amounts of knowledge that can be used to increase profits.
By utilising new technologies around business intelligence and IT tools, companies can join structured and unstructured data sets together to provide high-value results. When done correctly, the benefits will easily outweigh the costs involved with mining dark data.