Data Integrity is a concept used to refer to the accuracy and reliability of data. Also, the integrity quality of data comprises completeness, homogeneity, coherence, and commitment to the original. These qualities should be on the data itself, not just in the place where it is stored.
To make sure we have Data Integrity, we should effectively avoid that the content of a database, a process, or a system, is accidentally or intentionally modified or destroyed.
Data Integrity uses to be compromised, especially when the data is manipulated, transferred, or treated.
How to ensure our Data Integrity
There are some rules that we must follow if we want to minimize the risks to data quality and integrity:
- Properly manage the permissions to access data during the entire data lifecycle, so we limit the possibilities of someone non-authorised access to your data.
- Always validate data, even if we are gathering it or using it, put in place the processes needed to verify the integrity of your data continually.
- Make sure you have back-ups for your data.
- Use a log tracking system to have reliable traceability when data has been added, modified, or deleted.
- Conduct internal audits regularly.
Why is it essential to maintain Data Integrity
Data will be the primary raw material that the company will use for their analytical purposes, and in consequence, data is the very first step in our chain towards decision making. That’s why having leaks and weakness in the first step of this chain will certainly create poor results in analytics and poor decision making. That’s why ensuring Data Integrity is of utmost importance in a company.
How the source can affect your Data Integrity
A large amount of data sources in companies have made it more challenging to ensure Data Integrity, firstly from the fact that we will probably be gathering data from different and unconsolidated sources. So, because the causes are complex and disparate, it won’t be easy to integrate the data and standardize the format, making it then more challenging to monitor the Data Integrity. Having in place a data integration solution that ensures that we feed our Data Warehouse with the data spread all-over our corporation, will help to trace the integrity and quality of our data throughout its entire value chain.
The relevance of Data Integrity for Retail companies
It’s evident that retail companies, like any other business, need to have accurate data for making better decisions based on intensive knowledge about the customer, the sales, operations, etc.
But the most common landscape for retailers is composed of different and disparate systems and processes, including connections and interactions with several third-parties (suppliers, distributors, brands, etc.), which means many sources and formats of data. This same landscape is responsible for the high risk for data inaccuracy in a retail company. And is precisely this data inaccuracy what use to generate negative impacts in sales, on-time deliveries, operational efficiency, stock management, etc.
Over the past few years, data integrity has been seen as a relevant issue for most retailers. Retail companies need accurate and reliable data to forecast demand, analyze product development, improve operations, and make strategic business decisions.
Walmart’s use case: how data integrity can impact your business
A perfect real-life example of how data integrity, or the lack of it, can affect retail business. One of the most typical inconsistencies we can find in retailers' data is in the code on a packaging not matching the item. The packaging and labeling is still, in most cases, a human process so that many errors can occur during it. Then, the domino effect occurs when a retailer’s inventory replenishment systems use the wrong code to order the faulty product.
A study performed some years ago estimated that 25% of product data in the Master Item Data Catalog from Walmart was incorrect. And for them, the domino effect is more massive, because they use Retail Link to let suppliers collaboratively plan item replenishment based on POS data. So the use of this collaborative tool makes the dirty data move around faster. Walmart was using UPC at this time, and due to the increment of dirty data in their systems, they decided to move to a new standard, the GTINs. The idea was to create one data language for retailers and brands to avoid these inconsistencies they were suffering. The estimation is that Walmart saved up to 2 billion annually thanks to this clean data.