Are randomized controlled trials harming nonprofits?

In recent book, UChicago’s Nicole P. Marwell and Jennifer E. Mosley argue that the research method isn’t always the best tool

In the 1940s, medical researchers began using randomized controlled trials to assess the efficacy of health interventions. In RCTs, researchers create randomly assigned treatment and control groups, administering the potential remedy to only the first group, and comparing how the participants fared. 

This method, with its promise to tease apart cause and effect, had such seductive explanatory power that other fields began to take notice.

One of these was the social policy sector, Profs. Nicole P. Marwell and Jennifer E. Mosley write in their new book, Mismeasuring Impact: How Randomized Controlled Trials Threaten the Nonprofit Sector (2025). Advocates of conducting RCTs believed the trials could bring clarity to the messy work of helping people. Does a job training program really result in more participants getting jobs? Identify a control group and a treatment group and find out.

But as Marwell and Mosley, both professors at the Crown Family School of Social Work, Policy, and Practice, write, this well-intentioned notion has grown so powerful that the RCT came to be seen not as one method of assessment among many but as “the only method that tells you whether or not the program works,” Marwell said —and an important way for organizations to attract funding. 

Mismeasuring Impact challenges this status quo, arguing that RCTs aren’t always the best tool for the job and calling for a more expansive approach to the evaluation of nonprofit social programs.

Questioning the role of RCTs in social policy took some courage. “There’s a lot of support for this methodology on this campus and in this city,” Mosley said. “But in the Chicago way, they are supportive of having lively debate on the topic.”

The problems with implementation

When Marwell and Mosley began speaking to nonprofit employees, funders, and even the evaluators who help organizations plan and administer RCTs, they were surprised to discover how many people had developed misgivings about how the procedure was being implemented in practice.

Many of the concerns were methodological. 

Evaluators were especially troubled by insufficient sample sizes: Social programs are often expected to have small-scale effects or to address rare issues, such as youth involvement in gun violence. To statistically detect whether these kinds of programs are working, the organizations running them might need to enroll many hundreds of participants in an RCT. 

“But most youth programs don’t serve that many youths at one time,” Marwell and Mosley write in Mismeasuring Impact. “There may not even be that many youths in the neighborhood.”

RCTs are also plagued by problems with so-called control group contamination: People assigned to a control group who didn’t get access to an organization’s program may seek help elsewhere. This means the trial isn’t comparing the treatment to nothing; it’s comparing the treatment to a similar program delivered elsewhere.

Costly and time-consuming, RCTs are also placing a significant burden on the nonprofits conducting them. Under normal circumstances, an organization that sees an emerging need or a gap in its current offerings can quickly adjust; nimbleness is a historic strength of the nonprofit sector. But, Marwell and Mosley explain, organizations conducting expensive multiyear RCTs are essentially frozen in amber—they can’t change their programs once the trial is underway.

These implementation problems undercut a core claim made by RCT proponents: that RCTs provide the best evidence that a program either works or doesn’t. If an RCT finds that a program has no effect on outcomes, it could signal that the program is ineffective—but, Marwell says, it could also indicate that “you didn’t implement the RCT according to the very rigorous methodological standards that it requires.”

Beyond the RTC

Marwell and Mosley talked to many in the nonprofit sector who worried that the dominance of RCTs was pushing organizations to offer programs with easily measured outcomes. Yet many social needs don’t fit into the tidy cause-and-effect framework of the RCT. A homeless shelter, for example, provides vital services but may not on its own reduce homelessness. 

“If we continue down this road [where] RCTs are the only way to prove your legitimacy as a program,” Mosley said, “it does devalue those programs that are never going to be able to be part of an RCT. Is that really a world we want to live in?”

Of course “we do need to make sure our programs are effective,” Mosley says. Fortunately, she and Marwell point out in Mismeasuring Impact, there are many tools beyond RCTs to measure efficacy. 

Surveying program participants more regularly can yield essential information about what helps and what doesn’t. “Plan-do-study-act” cycles—quick, small-scale experiments—can also help organizations “iterate and improve on a continuous basis,” Marwell adds. 

That’s good for all nonprofits, including ones that will never be able to run RCTs. And what’s good for nonprofits is good for the rest of society, she says: These organizations are “such a critical part of our social safety net and a critical part of the services that help people grow and thrive.”

—A version of this story was originally published by the UChicago Magazine.