How are predicted probabilities distributed within and across groups?

In the code template for assessing our learner performance in testing data, we examined the distribution of predicted probabilities overall (with density plots). However, we can gain further insights by looking at how predicted probabilities are distributed within groups and across groups - whether different sites (schools, banks, store locations, program locations, etc) or different demographic groups. Insights about how predicted probabilities are clustered can guide decisions about how to allocate resources.

The following plot shows how predicted probabilities of whether students will drop out of school are distributed within a single school:

We can see that for this school, we have a balance of low-risk students and high-risk, and we also have a lot of students for whom their dropout risk is very uncertain (their predicted probabilities are in the mid range, close to 0.5).

The following plot also allows us to look at how predicted probabilities are distributed within schools but for many schools at a time.

NOTE THE Y-AXIS IS INCORRECT. I BELIEVE IT SHOULD BE NUMBER NOT PERCENT AND WE DO NOT KNOW THE SCALE BUT THERE ARE CLEARLY LARGER AND SMALLER SCHOOLS. For each school, we see three bars. The red bar indicates the number of students in the school. who have a predicted probability of dropping out that is less than or equal to 0.25. The yellow bar indicates the number of students who have a predicted probability that is greater than 0.25 but less than or equal to 0.75. And the green bar indicates the number of students who have a predicted probability greater than 0.75. This plot allows us to which schools have high concentrations or high numbers of students with a high-risk of dropping out of high school.

Back to top