Guest author: Dan Quirk, SAP National Practice Leader
While it’s widely believed that big data is worth the investment, a 2013 survey revealed 60 percent of IT leaders admitted that their organizations aren’t held accountable for the quality of the data. As big data becomes more relevant, what effect does dirty data have?
Data that’s referred to as “dirty” contains flawed information. Dirty data may include misleading or skewed data, duplicate data, incorrect or inaccurate data and non-integrated data. Dirty data is related to the “garbage in, garbage out (GIGO)” concept, meaning that a program’s quality of output can be no greater than the quality of input.
Dirty data can cost companies time and money. Data quality can affect productivity by up to 20 percent. More than 40 percent of business initiatives fail to meet their goals as a direct result of poor quality data.
Skewed data leads to skewed analytics and performance measurement, and poor data quality leaves analysts less time to actually analyze the data. Approximately 67 percent of business analysts spend too much time manually correcting data, which affects analytic quality and profit.
How can organizations work toward keeping data clean? Here are five tips:
- Establish a data management strategy.
- Check databases for a baseline of quality, then budget for data maintenance and upkeep.
- Plan for regular maintenance and updates.
- Prevent duplicate records – and segment old ones.
- Establish relationships only with reputable data partners.
Hype or ripe: Is dirty data an inevitable result of big data?