Is Data Science Reserved for Big Data?

Alex Olson
4/5/16 3:36 PM

During a data quality panel I served on, I was challenged by a co-panelist that our approach of using data science on structured data was overkill. Instead, his viewpoint was that a standard columnar analysis was adequate. The tools of data science and cognitive computing should be reserved for big data.  tree-1.jpg

Our viewpoint is different – we believe that we have moved from “data artisans” to “data scientists,” and we have many tools at our disposal today that we did not have in the past. These tools include:

  • Statistical analysis
  • Natural language recognition
  • Named entity resolution
  • Neural networks
  • Predictive analytics

Should all of these tools be used in every problem? No, a data scientist understands the nature of the problem and what tools to bring to bear. The question is, therefore, not whether the data is structured or unstructured, big or small. The question is what is the problem to be solved.

The following is an example from one of our customers. They have tens of thousands of entities that do not have an ultimate parent (top of a family tree). The data comes out of their client master, and while the data is structured, it has not had the controls in place to ensure that it is managed well. The historical approach would be to establish a team of research staff to perform manual validation processes. Yet, this is a budget buster. Is there a different way?

We took the approach of looking for patterns in the data using statistics and predictive analytics to place records in the proper family tree. The customer was able to receive the following:

  • Proposed ultimate parents for the entities in question
  • A confidence interval of each record assignment based on previous experience with the algorithm used
  • Segmentation of results based on demographics (examples: proposed ultimate parent, entity classification, country)

The customer was able to organize their review in a much more intelligent way than if it was all manually researched. The cost was a fraction of the manual approach with better results.

As you think about data quality, data cleansing, and your data management, consider whether data science and cognitive technologies can materially improve your operations. We have experienced the benefits, and we believe you can as well.

You May Also Like

These Stories on Text Analytics

No Comments Yet

Let us know what you think

TRUSTe