Data preparation


This section covers best practices and considerations for preparing data for predictive analytics. The data we use for predictive analytics can often be messy...

Note that for predictive analytics, data preparation may be an iterative process. In data with a large number of predictors, it may make sense to focus first on a core set of variables that are hypothesized to be strong predictors - those specified in a benchmark predictor set or incrementally expanded predictor sets. If validation of learners with these predictor sets indicate strong performance, then the data science team may move on to model selection and testing. However, if learner performance in the validation step is lower than desired or anticipated, then the data science team can circle back and work to extract more predictive value from the raw data.

Back to top