Designating data for testing

Once we have identified the appropriate data for all steps (train, validate, test) of our PA proof-of-concept workflow, we want to work backwards. So first, we want to split off a subset that will be used for testing. Some notes about designating the testing data:

Ultimately, for an honest assessment of our selected, “best” model, we want to conduct testing in a test data set that best mimics how the predictive model would be deployed.

Back to top