
AI and Racial Bias in Legal Decision-Making: A Student Fellow Project
Artificial intelligence (AI) is increasingly being used to assist in legal decision-making, from document review to predictive analytics. But what happens when AI is asked to render judgments in substantive legal cases? Can AI’s decision-making be influenced by racial bias, if we measure it against human judges’ decisions?
As a student fellow at the Harvard Law School Center on the Legal Profession, my research seeks to answer these questions by comparing AI-generated judgments with human judges’ decisions in two critical areas of federal trial practice: criminal sentencing and employment discrimination cases. These areas offer unique perspectives—sentencing cases allow for comparisons across racial groups, while employment discrimination cases (often brought under Title VII) provide insight into intra-racial group disparities. By comparing AI-generated judgments to real-world federal trial court decisions, my study aims to quantify variations and determine what factors contribute to differences in outcomes.
To conduct this study, I am reducing legal decisions into a multi-variable analysis. I incorporate sentencing and EEOC guidelines to identify key factors that influence verdicts. Then, I input core precedents from circuit courts and factual findings from lower trial courts into generative AI models, instructing them to render judgments.
The differences between AI-generated judgments and actual trial court decisions (“variations”) can be analyzed both qualitatively and empirically. Qualitatively, I manually review cases where AI results deviate significantly from real-world outcomes to identify potential reasons behind the discrepancies. Empirically, I use statistical analysis to correlate these variations with specific case factors, employing a multiple correlation coefficient model in SPSS to determine how different variables influence AI’s decision-making.
By examining variations in sentencing cases, we can directly assess potential racial bias in AI’s decision-making. Meanwhile, analyzing Title VII cases helps measure how AI interprets racial discrimination based on existing legal standards. Since sentencing is often less controlled by circuit precedents than employment discrimination law, it serves as a more direct test of AI’s inherent biases. To further refine the analysis, AI is instructed to rate various factors on a scale from 1 to 10, allowing for a weighted assessment of their influence on verdict variations.
Why should we study the racial bias of generative AI in legal decision making?
Despite growing interest in AI’s role in the legal industry, discussions about its potential bias remain largely theoretical. This study fills a research gap by empirically examining racial bias in AI-driven legal decision-making. Given that civil rights cases make up approximately 15 percent of a federal district judge’s docket, the ability of AI to navigate these cases fairly and accurately has significant real-world implications.
Unlike other industries where AI bias can be regulated through policy changes, bias in legal decision-making is more difficult to detect. For instance, when some state courts used COMPAS (a risk assessment tool) for criminal sentencing, its racial bias in predicting recidivism went largely unchallenged, even though empirical studies later exposed its flaws. If AI models used in legal practice inherit similar biases, they could systematically produce unfair outcomes without raising immediate red flags.
The current state of research
At this stage, I am focused on collecting and feeding data into AI models, particularly sentencing results for different racial groups in federal courts—specifically the Southern District of New York. Once data collection is complete, I analyze the outcomes generated by AI for patterns and biases.
Several methodological challenges have emerged. First, consistency is an issue. AI models often produce varying results depending on the model version, the timing of the query, and even seemingly random factors. This inconsistency complicates efforts to measure bias reliably. Second, randomness itself poses a challenge. Since sentencing decisions are converted into numerical values (e.g., length of sentence), there is a risk that AI-generated outcomes are no different from random number generation. Determining whether AI is making meaningful legal judgments—or simply assigning numbers arbitrarily—requires further statistical testing.
Concluding thoughts
AI’s increasing role in legal decision-making presents both opportunities and risks. While AI has the potential to improve efficiency and consistency in legal analysis, its susceptibility to bias remains a significant concern. This research aims to provide empirical grounding to discussions about AI fairness in the legal field, offering insights that could inform both policy decisions and the development of more equitable AI models.
As the legal profession continues to integrate AI into practice, understanding its strengths and limitations is essential. If AI is to play a role in judicial decision making, we must ensure that it does not perpetuate or exacerbate racial biases already present in the system. By rigorously testing AI-generated judgments against real-world legal outcomes, this study seeks to contribute to that effort.
Richard Hua is a 3L at Harvard Law School and a student fellow with the Center on the Legal Profession.
Event
AI and the Law: Navigating the New Legal Landscape
Knowledge Search
Search all of our knowledge, from The Practice to Insights to Academic Publications and Reports in one place.