DATABERG
Why large corporations are now focusing on Smart Data?
There is no discussion around the relevance that the Big Data concept has gained in the corporate world. But there are some other parallel concepts that, little by little, are experiencing a good evolution inside the data and analytics strategies of the business, such as small data, dark data, or Smart Data.
Today we are going to focus on this last one: Smart Data. The main reason why Smart Data has become a relevant trend is that it offers a more efficient and profitable strategy to manage and leverage the vast amount of data and information companies are generating.
From Big Data to Smart Data
Let's start by understanding what each one of these concepts exactly means by reviewing its definition. Based on Wikipedia's definition "Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. For Smart Data, we can find more discrepancy among different definitions, but this one might be a good one: Smart data is the digital information that is formatted so it can be acted upon at the collection point before being sent to a downstream analytics platform for further data consolidation and analytics.
So, in summary, while big data refers to the volume of data that can be managed; smart data focuses on the valuable data, so more on the quality rather than the quantity.
The rise of Smart Data projects has his root, in fact, in the failure of many Big data projects. Based on a recent Dataversity report, more than 63% of Big Data projects have not been successful in data-driven insights. And here is where Smart Data appears as a perfect tool to collect, compile, and process these data to prepare it for being efficiently used in generating new insights. Big Data means volume, variety, velocity, and veracity. But Smart Data also means Value.
How Smart Data is created
One of the main advantages of Smart Data lies in how these data are generated. Because data is not smart per se, you need to process it to make it smart. And the beauty of Smart Data is that this transformation happens most of the time close to the data source, thanks mainly to Edge Computing technologies. So Smart Data is formatted at the collection point, so we transform unstructured and non-valued data into structured and valuable data ready for further consolidation and analysis.
And that's why Smart Data is gaining prominence because it offers an opportunity to develop more efficient and faster data and analytics.
By applying some decisions and processes over incoming data immediately, at the same entry point, we avoid the need for processing power from a centralized system. So we moved from batch processing to edge processing. Around 10% of enterprise-generated data is created and processed outside a traditional centralized data center or cloud. By 2022, Gartner predicts this figure will reach 75%.
Thanks to Smart Data, we can develop, for example, real-time analytics; as data arriving at the output sink already processed makes it more accessible and faster to transform it into insight. Smart data can monitor data at the source and capture only the events that are relevant for our analytics-purpose.
Why collect Smart Data
As we've already explained, the Big Data approach of many large corporations, that had proved to fail, consisted of collecting everything, all data, and storing it in the famous "Data Lake." During the last few years that was a trend among large enterprises, to build a Data Lake. And most of them feel proud and spread the word about why and how they were making the data lake and feeding it with Big Data. The main objective of these data lakes for companies is to have all their data in one single place so they can use it afterward for whatever analytics or even AI project they want to implement in the future (near future, or even present). However, two main concerns had recently arisen:
One is that although they are trying their best and using the latest technologies to gather all the organizational data, in reality, there is enormous volume and types of data not being collected (dark data), the data still lacks quality, quantity, or may even be in the wrong format. Organizations are managing more or less to gather all their structured data and the most innovative ones, even the unstructured but stored data. But the non-stored and unstructured data organizations are generating; the dark data still counts for at least 80% of data generated by companies.
Two is that indeed, data takes time, money, and effort to be collected, stored, organized, and governed. That's why collecting Smart Data rather than swallowing all data might be a more efficient and profitable strategy. And some leading large corporations are now implementing these purpose-oriented data strategies that drive efforts to leverage the data that will bring them real business value.
Smart Data and IoT
The term smart data is often associated with the Internet of Things (IoT). From one side, the IoT-enabled edge devices are a crucial driver of growth for data volumes. And, at the same time, the effectiveness of the IoT hinges heavily on the availability of data analytics to derive value from all the data that is collected. As we saw above, the traditional model of collecting, processing, and storing all data is becoming too costly and too slow to meet the requirements of IoT technologies, products, and projects. So the need for quick and reliable insights from the IoT devices has motivated the growth of a typical IoT approach of Smart Data.
The rise of Smart Data
'If a machine-learning algorithm can give product recommendations using modest data sets, why take the big data route?'
But as InformationAnge points in their recent article "Is big data dead? The rise of smart data", despite the fact that for some projects and strategies a smart data approach might prove to be more efficient and convenient, that doesn't mean that we should forget about Big Data. The key is to find which strategy will be the one more aligned with the purpose of our data and analytics challenge.