Predictive Analytics can help reduce your Data Decay

Data decay is a pervasive problem in many industries. Whether it's stale customer information at a retail enterprise or bad provider addresses and pricing on a medical insurer’s website.

Data decay can lead to frustrated customers and a lack of trust in a business. Many businesses invest thousands of dollars through manual scrubbing of data to try and clean data, but often times the entirety of a dataset must be scrubbed to find the fraction of data that is bad. What if we could predict what data will be bad soon, and preemptively identify at-risk records? This is exactly what a predictive analysis of recently scrubbed data can do.

Predictive analysis uses historical data to identify key variables that are important in influencing results. For example, using a scrubbed data decay set, you could input the pre-scrubbing information such as date of last contact, customer age, last purchase amount, last purchase category, etc. as predictor variables, then you use the scrubbing result as the outcome variable (data was bad/good). The predictive model will then determine which variables influenced the result and what level of impact they had.

Choosing the right model

There are several different statistical approaches for resolving data decay issues. Some are highly sophisticated learning algorithms that involve complicated statistics that can be intimidating. These can usually incorporate changes over time but have a steeper learning curve. There are however simpler methods for predicting data decay that can provide a holistic understanding of what is going on at a single point in time. Deciding which to use will depend on your familiarity with statistical methods, machine learning, and also your goals for the analysis. Are you looking for a model that will be more precise, but will also require maintenance over time? Or are you looking for something that you can run once, drill down on high impact areas, and then work on correcting the data? It really depends on the use case, skill level, the complexity of your data and expectations around the results of the analysis.

Choosing the appropriate variables

Determining what variables to put into a statistical model can be more important than the model that you choose. It is important to make an informed decision when entering predictor variables. Rely on your past experiences as well as the expertise of others to determine what should go into the model. Remember, if you don’t get significant results the first time, it’s always ok to go back to the drawing board and pick other predictive variables to test.

Once you have run your analysis, you will have a list of key variables that impact data decay. You will be able to take this list of variables to identify the records that are at risk of data decay, and you can develop a remediation plan to keep your data fresh. Overall, this may seem like a daunting task that can make most feel overwhelmed. Luckily, we can help with such an endeavor. Contact the data decay experts at Data Ideology today to help you set up your predictive analysis to eliminate data decay today and in the future.

Written by Rebecca Gazda

Managing Director at Data Ideology

Rebecca Gazda is the Managing Director with 15+ years of experience in Data and Analytics, Statistics, and Data Team Management.

Strategy

Why do you need a Data Strategy now more than ever?

With increasing globalization and technology developments spurring modern economics, Data Strategy has been vital in identifying and understanding customers.
Data Management

Data Modeling is the Key to Success

A data model enables you to make decisions based on facts instead of educated guesses. However, the value goes far beyond that. With data modeling, you’re easily able to store and access information that will benefit you and your organization.
Data Science

Predictive Analytics can help reduce your Data Decay

Data decay is a pervasive problem in many industries that can lead to frustrated customers and a lack of trust in a business.