Judging from Empirical Research

From The Practice September/October 2017
How randomized control trials could revolutionize pretrial release

As Harvard Law School professor Jim Greiner laid out in the lead article of this issue of The Practice, the central goal of the Access to Justice Lab (A2J Lab) is to bring more rigorous and scientific research methods to the critical problems facing the legal profession in the realm of access to justice. To do this, the Lab conducts original empirical research across a host of areas within that space. Each study has, essentially, two components: an intervention aimed at improving access to justice and a rigorous assessment of that intervention to determine its effectiveness. As the article “Drawn to Action” discusses, the Lab is at the forefront of constructing its own interventions, which then serve as launching points for some of its studies. For the other studies, the Lab’s central role is testing the effectiveness of extant interventions to see how well they work in the field. Either way—whether testing an existing intervention or one of their own design—the Lab’s central purpose remains the same: to assess access to justice interventions using hard-nosed, scientifically based research methods. This article takes a closer look at the inner workings of these rigorous assessments.

While the A2J Lab uses a number of research methods, one of the most important (and often misunderstood) is the randomized control trial (RCT). Perhaps best known for its application in the medical field, the RCT is rarely associated with the law, let alone access to justice. The A2J Lab is working to change that. To demystify what risks being an overly technical concept, this article explores what goes into an RCT within the legal profession, including what lessons can be learned from other fields. How does the Lab propose to apply methods such as the RCT to the law? What are the benefits and the drawbacks? Who needs to get onboard for this type of research for it to even be possible? What lessons can we learn from this sort of study, and how can we apply them to create better access to justice policy and outcomes? To contextualize and animate this discussion, we highlight one of the Lab’s central evaluations, its Pretrial Release Study, which, as we examine below, is an RCT designed to test whether a “risk assessment” algorithm used to help determine bail decisions achieves the desired outcomes—or not.

The Public Safety Assessment

As the A2J Lab does with each of its studies, we begin with the problem at hand: pretrial release hearings. When people are arrested and charged with a crime, they are brought before a judge to determine whether they should be released before their trial takes place. One of three outcomes is possible:

  1. The individual is released without monetary conditions.
  2. The individual is released on bail, which requires the posting of cash or surety bonds.
  3. The individual is incarcerated until a subsequent decision is rendered.

But what goes into this determination? When is release granted, and when is it not? When a judge considers whether and how to grant or deny pretrial release, he or she weighs two key risks: If released, will this defendant (1) fail to appear for court or (2) commit a new (even violent) crime? If the judge errs too far toward release, the public is potentially at risk. If the judge errs too far toward incarceration, the court risks creating undue financial and human costs, namely on an individual presumed to be innocent.

The PSA did not rise out of a sense that judges were failing to get it right. Rather, a core driver of the tool’s development was a call for help from the judges themselves.

These are big decisions, and they add up to an even bigger problem. According to the Laura and John Arnold Foundation (LJAF), more than 60 percent of the current U.S. prison population is awaiting trial, costing more than $9 billion annually. Put another way, the vast majority of people in state jails are imprisoned without a trial and without a guilty verdict. With such high stakes at play, one is naturally left to wonder how exactly these levels of risk are estimated. Furthermore, and more important, how can judges more accurately assess these risks to best serve public safety and the promise of justice for all?

In their research, LJAF found that high-risk defendants were often released while low-risk defendants were often detained. To help both rationalize and standardize pretrial release determinations, in 2013 LJAF created their Public Safety Assessment (PSA). Drawing from more than 1.5 million cases across more than 300 U.S. jurisdictions, the PSA uses what social science research has identified as the most predictive factors in released defendants’ failing to appear at court (FTA) and committing new criminal activity (NCA), especially new violent criminal activity (NVCA). To this end, the PSA set out with two basic protocols: Use only objective inputs taken uniformly from static administrative data, and avoid constitutionally suspect inputs like race or those that courts can glean only from expensive interviews, such as employment status and length of residency in the jurisdiction. (Without a tool, a judge typically considers a host of information, including data gathered from legally fraught interviews with the defendant.) In the end, the PSA found the answers to the following nine factors best predicted FTA, NCA, and NVCA risks:

  1. What was the individual’s age at the time of the arrest?
  2. Was the individual’s offense violent?
  3. Were there pending charges at the time of the offense?
  4. Did the individual have prior misdemeanor convictions?
  5. Did the individual have prior felony convictions?
  6. Did the individual have prior violent convictions?
  7. Has the individual failed to appear for court in the past two years?
  8. Has the individual failed to appear for court in more than two years?
  9. Has the individual been sentenced to incarceration before?

Source: www.arnoldfoundation.org

Beyond simply identifying risk factors, the PSA’s assessment tool attempts to improve pretrial release decisions by synthesizing this information into a simple set of scores and recommendations. LJAF constructed an algorithm using the answers to the questions above to produce two scores: one for an individual’s risk for FTA and one for his or her risk for NCA; in addition, it flagged those at risk for NVCA. Different combinations of the above nine questions are considered to individually assess the risk for FTA, NCA, and NVCA. For example, while the existence of pending charges is relevant to all three risks, previous FTAs more than two years removed from the time of arrest are relevant only to the current FTA (but not to NCA or NVCA). For each question considered, depending on the answer, a numerical weight is assigned. Then the numbers are tallied, and the result is a score for the individual’s risk for FTA and a score for his or her risk for NCA (both on 1–6 scales); a separate “flag” exists to indicate that the defendant is at risk for committing an NVCA.

According to LJAF, the goal was not to replace judicial discretion but to enhance it by providing the bench with consistent, reliable information on which they might base their decisions. Moreover, it should be stressed that the PSA did not rise out of a sense that judges were failing to get it right. Rather, a core driver of the tool’s development was a call for help from the judges themselves. “It was surprising to see firsthand what judges had to say when the Arnold Foundation went out into the field to ask, ‘What’s the most important issue in the criminal law sphere that needs addressing?’” says Chris Griffin, the A2J Lab’s research director. “Their answer was pretrial release decisions. They admitted that they didn’t really have the information they needed, that they were largely having to make their best guess.” Indeed, while the PSA is one of the most well-known risk assessment tools used in the courts, recent years have seen a proliferation of similar evidence-based rubrics meant to help rationalize court processes and decisions.

The Pretrial Release Study

If the PSA was designed to help judges make release decisions by improving the accuracy of pretrial release judgments—largely by increasing the release rate of low-risk defendants and decreasing that of high-risk defendants—the critical question is: Does the PSA work? Enter the A2J Lab.

In a nutshell, the A2J Lab’s Pretrial Release Study aims to assess the assessment. Does the PSA yield better outcomes in pretrial release decisions than unguided decision-making? It is important to note that the Lab didn’t create the PSA and is not primarily tasked with assessing the PSA’s underlying algorithms with the explicit purpose of refining them. The core goal of the study is to assess whether the PSA is having the desired effect of improving a judge’s ability to make accurate pretrial release decisions. To realize this goal, the A2J Lab is undertaking a randomized control trial.

The legal environment is often not designed to be rigorously studied, and countless obstacles require attention to pull it off.

The RCT is one of the primary assessment tools the Lab uses to find what works within the legal profession. Often referred to as the “gold standard” of empirical research methods, the RCT is most often used in medical research to determine whether a particular intervention yields distinct results—for example, a drug trial in which one randomly selected group of patients receives a placebo and another randomly selected group of patients receives the experimental drug. Outlining the central underpinning principle of the method, Griffin explains, “An RCT tries to remove, as much as possible, the effects of countless background factors—things like gender, age, education, knowledge, etc.—that might also drive the outcomes in question.” He continues:

You remove all those other factors by randomly selecting some people into one group that encounters the intervention and then leave the rest of the population to experience the status quo—or rather, not encounter the intervention. By creating two randomly selected groups this way, you can reliably test whether those exposed to the intervention have different outcomes than those who were not. Moreover, and most critically, because of the randomization we can attribute those different outcomes to the intervention itself and not some background factor.

As noted in the lead story to this issue of The Practice, the legal community has yet to fully embrace the RCT (as well as other rigorous empirical methods). Members of the legal profession, notorious for their staunch commitment to proof and precedent, often demand results before implementing new methods or interventions. The problem with this mindset in the context of an RCT is that the proof and precedent come only at the end of the study. That is to say, the real-world impacts of any given intervention are largely unknowable until after it has been tested. Again, think drug trials. While all available information about what works might goes into creating a new drug (see “Drawn to Action” for a perspective on this question in the law), only by running the clinical RCT trial is the actual outcome determined. Without testing, even the best intentions may not produce the best results.

Into the field

To that end, while the A2J Lab is among the most vocal proponents of bringing this type of research to the law (see “Lessons from the Poverty Action Lab” for an example of RCTs in public policy and economics), one of the central issues they come up against is finding partners willing to test an intervention that may or may not produce the desired outcomes. Griffin extends on the medical comparison:

Unlike in the medical context, where the patients who would be part of an RCT generally want to take advantage of a potentially helpful drug or intervention, in the law, the situation is different because your test subjects, so to speak, may be resistant to the very idea of evaluating their processes and procedures.

Often, Griffin goes on, there are difficult, intense conversations on the front end to get all parties onboard to move forward with a legal RCT. Indeed, convincing court administrators, public defenders, district attorneys, and all other lawyers involved that an RCT should be used in their courts is no small hurdle. The good news in the first PSA study launched, Griffin notes, is that the Lab found a partner in Dane County, Wisconsin, which agreed to use the PSA and was willing to serve as an RCT site to test its effectiveness. “Thankfully, and to their credit, Dane County didn’t present the typical hurdle,” comments Griffin. Indeed, according to the Lab, Judge Juan Colás, who sits on the circuit court bench and serves as its presiding judge, has been involved in conversations with the Lab about the wisdom of and need for an RCT evaluation. “I am pleased that Dane County will be adding an evidence-based tool, the PSA, to the information available to judicial officials making pretrial release decisions,” Judge Colás told the Lab. “It’s especially exciting that we are doing so as part of a randomized controlled study through Harvard Law School. I hope we’ll find that adding the PSA helps us make wiser use of pretrial incarceration and monitoring resources while lowering our failure-to-appear rate and protecting the public.”

By creating two randomly selected groups, you can reliably test whether those exposed to the intervention have different outcomes than those who were not,” says Chris Griffin of the A2J Lab.

Chris Griffin, Access to Justice Lab

The challenges do not stop there, however. The legal environment is often not designed to be rigorously studied, and countless nuances and obstacles require attention to pull it off. For instance, when a partner agrees to work with a researcher, one of the most critical issues is access to data, information, and subjects necessary to undergo the randomization. Can the researcher get all the data he or she needs to effectively run the analysis? This, of course, is crucial. Even then, the need to educate stakeholders on RCTs is always present. “In one sense, once an agency or individual or group of people agrees to do the study, they are onboard and don’t really need convincing,” says Griffin. “However, sometimes it might still be necessary to continue educating partners on what it really means to randomize in the law.” The latter effort will most likely remain important throughout the processes of implementing an RCT and collecting the subsequent data.

While Dane County was onboard in a broad sense, there were still countless challenges with respect to on-the-ground coordination and data access. “One major obstacle to making this work was assuring that we would have all the data we would need to run the analysis once the randomization period has concluded,” explains Griffin. “To do that well—to do it at all—has required our access to five different databases within the county.” Griffin, who is on the research team for this study, stresses the significant challenge this creates. Not only are these databases located in different offices, but they are created for distinctly separate purposes. “We have to make those databases ‘talk to each other,’” says Griffin. The good news, Griffin notes, is that in Dane County, the Lab has active partners on the ground, particularly local staffers whose job is to ensure interagency adoption of the PSA as well as on-the-ground coders who can shepherd researchers through the various databases.

Beyond the imperative to ensure reliable data collection, the Lab is also constantly educating and reeducating partners and stakeholders in the court system to keep the process running smoothly. Even after the study was approved, the Lab held numerous conversations with principal judicial officials in Dane County to reaffirm that the randomized study would in no way interfere with the fundamental rights of individuals to due process and equal protection. This, Griffin notes, is no less important than any other step in rolling out the study. The importance of a judge’s confidence that the conduct of his or her courtroom is upholding state and federal constitutional norms cannot be overstated.

Apart from the judges’ concerns, public defenders were concerned about how the study might affect judicial discretion and that an algorithm was replacing judges in determining the fates of defendants before a trial even began. Again, these are not trivial concerns, so the Lab prioritized the education of public defenders to reassure them that neither the PSA nor the study were replacing judicial discretion, that they only provided recommendations, which judges were free to consider or ignore. Similar conversations were held with the district attorney’s office and many others.

Randomization—and beyond

Apart from the obstacles, there is the study itself. The Pretrial Release Study is a relatively straightforward RCT based on two main randomly selected groups: those where the judge receives a PSA recommendation and those where he or she does not. As noted above, the PSA algorithm takes its inputs from administrative records and produces two scores that correspond to two risks from release: a score for the defendant’s risk of failing to appear for court and a score for the defendant’s risk of committing a new crime (and a violence “flag”). In the RCT, a randomly selected group of cases will afford the judge the benefit of the PSA’s information. In the other group of cases, the judge will not have the PSA. (One Dane County commissioner, a judicial official who presides over initial appearances at which pretrial release decisions are made, oversees the vast majority of initial appearances there.)

The task then turns to tracking whether the results for the intervention group are discernably better than they otherwise would have been—that is, better than the control group. Griffin explains:

To make randomization work, we needed to choose something related to the case that has nothing to do with the individual defendants or their case merits. We decided that even case docket numbers would include the PSA’s recommendation. If the case number is odd, it won’t. We worked with the Dane County Clerk’s Office, who is overseeing the production of the PSA reports, to come up with that protocol. Every morning, our two very fine assessors in Dane County, who are working up the PSA reports, see that it’s done. The software system that they’re using automatically suppresses the odd-numbered case reports and releases the even ones for them to print and provide to the commissioner. We had to work through a number of contingencies related to the nature of the charges, essentially whether the district attorney had filed them yet, and what effect filing would have on the randomization scheme. But at the end of the day, it’s a fairly simple approach whereby a piece of paper with evidence-based recommendations is made available in roughly half the cases and not in the other. 

The Intimate Partner Violence Triage Study

The RCT used in the Pretrial Release Study is about as straightforward as it gets: There are two groups whereby one gets the intervention and the other does not; then the outcomes are measured. But RCTs can be used in far more complex ways. One example is the Access to Justice Lab’s Intimate Partner Violence Triage Study (IPVT).

IPVT is all about triage in the legal context. How do you allocate limited resources most effectively when helping everyone is not possible? How do you set priorities? A classic example of this plays out in hospital emergency rooms every day, where the severity of each patient’s affliction is weighed against that of the other patients and available resources are distributed accordingly. Public interest lawyers have to make triage decisions in the course of their work, too. In other words, there is often more need for legal representation and counsel than there are lawyers available to provide those services. IPVT measures where lawyers’ time is most effectively spent to maximize the impact of the legal support they provide to victims of intimate partner violence.

The A2J Lab is using a more complex RCT framework to measure the impact of these triage decisions. The randomization works like this: In each instance, the lawyer will talk to the victim and gather as much information as possible to determine whether the given individual is best helped by legal representation or by legal self-help materials—the latter freeing up more of the lawyer’s time to help other victims. “Because the RCT is trying to determine the best use of the lawyer’s time,” explains Griffin, “the lawyer’s decision is not always going to be the one that’s followed.” Instead, the RCT employs its first randomization—essentially a coin flip—to determine whether that decision will be made by the lawyer or by a subsequent coin flip. “If that randomization results in the attorney making the decision, we’re done. The attorney’s decision holds. If it goes the other way, then comes the second coin flip.”

This second stage then determines which randomly chosen intervention the victim will receive: representation or self-help materials. Griffin explains:

When you have the RCT set up this way, you can do two important things: First, you can study whether it makes a difference that a lawyer—a human being—is making that call or whether leaving it to chance leads to equally good outcomes. In other words, we can determine if lawyers are efficiently identifying those who would not have had a better outcome without representation. Second, you can test the effectiveness of that representation relative to the provision of empowering self-help tools. So, you get answers to both questions, among others: First, is the decision to represent made well? And second, does representation make a difference? By having that two-dimensional randomization scheme, we’re able to evaluate a wider set of key questions and do so in a way that is slightly more rigorously designed than the Pretrial Release Study.

Check back often on the A2J Lab’s blog to hear continuing updates on the Intimate Partner Violence Triage Study.

At this stage, the Pretrial Release Study is currently in its randomization and data collection stage, so final results are not yet available. As is the case with many field-based RCTs, results often take months—if not years—to collect. Griffin notes, “At this point we are just making sure that data integrity is maintained and that the information we need is being entered into the database.” However, when those results do come in, the study will collect a wealth of data for each arrestee it follows. The Lab is following several principle outcomes of interest:

  • Duration between initial appearance and case disposition
  • Whether the arrestee was convicted of any charge
  • Whether the arrestee was sentenced to any term of incarceration and the number of months ordered to serve, if any
  • Failure to appear, both whether the arrestee missed any court date in the case as well as how many FTAs the arrestee exhibited
  • Whether the arrestee engaged in any new criminal activity and the number of separate incidents (both between initial appearance and case disposition and up to two years after initial appearance)
  • Whether the arrestee engaged in any new violent criminal activity and the number of separate incidents (both between initial appearance and case disposition and up to two years after initial appearance)
  • Whether the arrestee was released at any point between initial appearance and case disposition, the number of days the arrestee spent incarcerated during that period, and the share of days from initial appearance to case disposition that the arrestee spent incarcerated

Source: a2jlab.org

There will undoubtedly be critical insights generated from the Lab’s Pretrial Release Study into both the efficacy of the PSA tool as well as how best to conduct pretrial release operations more generally. However, as with all the Lab’s work, the most long-lasting impacts arguably come in changing the hearts and minds of those in the legal profession toward broader acceptance of the importance of rigorous, empirical testing.