What is a Data Diagnostic?

Data Diagnostics are coming to a data quality program near you. The notion of “diagnostics” isn't new; they are founded in the scientific community and defined as the ability to distinguish through symptoms or characteristics. At Kingland, we’ve been working to pioneer the development of data diagnostics over the last year and I’d like to share with you why I think data diagnostics are going to be so important.

In our work with many of the largest financial institutions, we’ve had the good fortune of analyzing millions and millions of records ranging from security master to client and counterparty databases. In a comparison to healthcare, we are the doctors and the data is our patient. Just as in healthcare where doctors see many common symptoms, we too see a number of common symptoms (data quality problems) in the data. After years and years of looking at hierarchy data issues, duplication, inconsistencies, missing information, errant identifiers, and outdated records - we simply said enough is enough. Why can’t we build out software that can replicate the identification of these “symptoms” so we can diagnose our "patient" (these large piles of messy data)? Well, that’s where we went with Data Diagnostics. Data Diagnostics are simply the codification of these patterns related to analyzing and assessing data quality. By codifying the patterns we observed from our analysis of millions and millions of records, data diagnostics are able to be run, re-run, and are dramatically reducing the time and investment required in data quality programs. They’re also improving the consistency and predictability.

This is no different than how doctors use X-Rays, CAT scans, and blood work to quickly and repeatedly gather facts to be able to diagnose their patients. Now, we’re doing it with data and we’re seeing Chief Data Officers look at diagnostics as an incredibly valuable strategy for integrating actionable data science into their data management strategy.