Addressing bias

The following provides some guidance for addressing bias - suggestions for how to avoid bias when building models and for how to correct models when bias is detected.

Identify predictors that are likely to be less biased. Less biased predictors in modeling lead to less biased models. Having a deep understanding of the measures in your data, including how they are collected and entered, is important for identifying predictors that may be biased. When we discussed ethical considerations earlier in the class, we talked about the value of having subject-matter experts on the project team or as advisers. In general, researchers should consider identifying data variables that are believed to be measured similarly in the different groups of interest. For example, returning to our example of pretrial risk algorithms, data scientists might discount all minor drug possession cases and focus on convictions rather than arrests, or serious crimes, which are less likely to be biased, when generating lists of potential risk factors.

Screen for predictors that have the same relationship to the outcome across different attributes (e.g., races). Predictive models often cannot specifically account for race or other protected attributes (that is, include these variables as predictors). In some contexts, especially in areas like criminal justice, the provision of government services, hiring or lending, the use of race as a predictor is prohibited or limited by law to prevent discrimination. In these cases, screening predictors will be an important step. One way of screening is to fit initial models that include race by risk factor interactions to identify risk factors that predict outcomes differentially, and then exclude those risk factors from subsequent modeling. Data scientists can compare the overall predictive accuracy of the models fit to the restricted set of factors with the model fit to all factors to determine the cost in accuracy.

In some contexts, it may make sense to incorporate the attribute of concern (e.g. race) in order to reduce bias. In certain situations, incorporating race in a predictive model may be a strategic approach to minimize bias and ensure that services are distributed more equitably. By considering race, predictive models may be calibrated to address the unique needs and challenges faced by different racial and ethnic groups, leading to more fair and just outcomes. For instance, in healthcare, predictive algorithms have been adjusted to consider race to ensure that resources, treatments, and interventions are allocated more fairly, accounting for differing disease prevalences and health outcomes among various groups. However, this practice has also a subject of debate. Critics argue that the use of race in such models is an oversimplification and can perpetuate racial health disparities. Some healthcare institutions are reconsidering and revising their practices to minimize racial bias and ensure more individualized patient care that considers a broader range of biological and social determinants of health. Optionally, see this article, “If race has predictive power, it should be used in health care,” which discusses this issue. This journal article also discusses this debate.

Identify risk-assessment outcomes that are likely to be less biased. Biased outcomes can affect both the construction of a predictive model and its evaluation. For example, in pretrial risk assessments, it might help to focus on chronic failure to appear for court dates rather than a single failure, as the latter might be more indicative of willful behavior rather than other factors such as lack of access to transportation.

To address bias that may be introduced by censoring, capture differential relationships between censoring and subgroups when imputing censored outcomes. One of the approaches to address the censoring problem when validating predictive models is to impute missing outcomes for those units for which outcomes cannot be observed (e.g. new criminal activity among those detained during the pretrial period). This imputation could include race and other characteristics not used for the final predictive model. (As imputation is only used to build the model. These characteristics will not, in the end, be used as predictors by the final predictive model.) To the extent that the imputation captures any differential relationships between censoring and subgroups, the subsequent model-fitting process will not be as vulnerable to biases from censoring.

Assess bias in the decision-making guidelines, in particular those focused on thresholds. When validating predictive models, data scientists should assess the bias (and accuracy) implications of the decision-making guidelines, not just the predictive probabilities. Therefore, differences in AUC-ROC or AUC-PR are insufficient. For a reliable assessment of bias, it needs to be evaluated at the threshold(s) that drive decisions about who receives extra intervention.

Reassess bias over time. The extent of bias may change over time. Because there may be changes in the population targeted by services, bias (as well as predictive performance) should be monitored frequently.