The importance of a proof-of-concept

After we have scoped our predictive analytics project, the next step is to conduct a “proof-of-concept.” That is, before getting too invested in incorporating predictive analytics into systems of service improvement, we need to investigate the following:

Given the available data, how well can we predict our outcome of interest?

When investigating this question, we will want to consider various metrics of model performance, which we will turn to in a later section. We will be concerned with the reliablity of our results. How “accurate” are they? I put “accurate” in quotations because I am referring to a general assessment of model performance, rather than the precise statistical definition of accuracy, which we will discuss soon, along with many other metrics of predictive performance.
Given the available data, to what extent is there bias in our predictions?

When investigating this question, we will want to consider various metrics of bias, which we will also turn to in a later section. Here we will be concerned with whether and by how much model performance varies for different groups. Later, we will discuss that there are multiple definitions of bias and therefore multiple ways to measure it.
To what extent are complex predictive models worthwhile? Do they improve upon a simpler model or decision-making process? Do they improve upon current practice?

When investigating this question, our focus is on contrasting predictive capabilities and biases across various prediction approaches. We start by looking at the simplest approach, which doesn’t necessarily involve a statistical model at all. A simple decision-making process might involve taking a small number of measures and combining and weighting them in a straightforward manner based on prior knowledge. For one version of this approach, we may set criteria that assign units into one predicted catgory or the other. For example, in the high school graduation example, perhaps any student with attendane less than 90% and with less than 20 credits is predicted to be high-risk for not graduating on time (we predict they won’t). This type of approach is sometimes called a rules-based approach or a decision tree, among other terminology.

Moving to more complex approaches, we could plug a small number of measures into a regression model to estimate their relationship with the outcome. Finally, the most complex approaches might involve adding more measures, potentially a large number, and/or using advanced, data-driven algorithms - i.e., machine learning algorithms.

The crux then lies in weighing the pros and cons when comparing different models. Should a more complex approach yield superior results compared to a simpler one, it prompts a consideration: Are the improvements significant enough to offset potential drawbacks in terms of transparency and explainability? Additionally, do these improvements outweigh any potential challenges in implementing and sustaining a more intricate modeling process?
Do stakeholders understand and trust the results from the “best” predictive model?

While stakeholders should be engaged in the PA scoping process, the proof-of-concept provides another opportunity for valuable consultation. Sharing results used to investigate the above questions supports transparency and trust in the process. Stakeholders’ input and questions are essential to the goals of the proof-of-concept - to assess whether and how predictive analytics should be deployed, communicated and used to improve services.

For further consideration: Are there other questions you would want to investigate in a proof-of-concept?