Statisticians React to the News

The rise and fall and rise of randomized controlled trials (RCTs) in international development

13 October 2020
andrew-gelman-blog

Gil Eyal sends along this fascinating paper coauthored with Luciana de Souza Leão, “The rise of randomized controlled trials (RCTs) in international development in historical perspective.”  Here’s the story:

Although the buzz around RCT evaluations dates from the 2000s, we show that what we are witnessing now is a second wave of RCTs, while a first wave began in the 1960s and ended by the early 1980s. Drawing on content analysis of 123 RCTs, participant observation, and secondary sources, we compare the two waves in terms of the participants in the network of expertise required to carry out field experiments and the characteristics of the projects evaluated. The comparison demonstrates that researchers in the second wave were better positioned to navigate the political difficulties caused by randomization.

What were the key differences between the two waves? Leão and Eyal start with the most available explanation:

What could explain the rise of RCTs in international development? Randomistas tend to present it as due to the intrinsic merits of their method, its ability to produce “hard” evidence as compared with the “softer” evidence provided by case studies or regressions. They compare development RCTs to clinical trials in medicine, implying that their success is due to the same “gold standard” status in the hierarchy of evidence: “It’s not the Middle Ages anymore, it’s the 21st century … RCTs have revolutionized medicine by allowing us to distinguish between drugs that work and drugs that don’t work. And you can do the same randomized controlled trial for social policy” (Duflo 2010).

But they don’t buy it:

This explanation does not pass muster and need not detain us for very long. Econometricians have convincingly challenged the claim that RCTs produce better, “harder” evidence than other methods. Their skepticism is amply supported by evidence that medical RCTs suffer from numerous methodological shortcomings, and that political considerations played a key role in their adoption. These objections accord with the basic insight of science studies, namely, that the success of innovations cannot be explained by their prima facie superiority over others, because in the early phases of adoption such superiority is not yet evident.

I’d like to unpack this argument, because I agree with some but not all of it.

I agree that medical randomized controlled trials have been oversold; and even if I accept the the idea of RCT as a gold standard, I have to admit that almost all my own research is observational.

I also respect Leão and Eyal’s point that methodological innovations typically start with some external motivation, and it can take some time before their performance is accepted as clearly superior.

On the other hand, we can port useful ideas from other fields of research, and sometimes new ideas really are better.  So it’s complicated.

Consider an example that I’m familiar with:  Mister P.  We published the first MRP article in 1997, and I knew right away that it was a big deal–but it indeed took something like 20 years for it to become standard practice.  I remember in fall, 2000, standing up in front of a bunch of people from the exit poll consortium, telling them about MRP and related ideas, and they just didn’t see the point.  It made me want to scream—they were so tied into classical sampling theory, they seemed to have no idea that something could be learned by studying the precinct-by-precinct swings between elections.  It’s hard for me to see why two decades were necessary to get the point across, but there you have it.

My point here is that my MRP story is consistent with the randomistas’ story and also with the sociologists’.  On one hand, yes, this was a game-changing innovation that ultimately was adopted because it could do the job better than what came before.  (With MRP, the job was adjusting for survey nonresponse; with RCT, the job was estimating causal effects; in both cases, the big and increasing concern was unmeasured bias.)  On the other hand, why did the methods become popular when they did?  That’s for the sociologists to answer, and I think they’re right that the answer has to depend on the social structure of science, not just on the inherent merit or drawbacks of the methods.

As Leão and Eyal put it, any explanation of the recent success of RCTs within economics must “recognize that the key problem is to explain the creation of an enduring link between fields” and address “the resistance faced by those who attempt to build this link,” while avoiding “too much of the explanatory burden on the foresight and interested strategizing of the actors.”

Indeed, if I consider the example of MRP, the method itself was developed by putting together two existing ideas in survey research (multilevel modeling for small area estimation, and poststratification to adjust for nonresponse bias), and when we came up with it, yes I thought it was the thing to do, but I also thought the idea was clear enough that it would pretty much catch on right away.  It’s not like we had any strategy for global domination.

The first wave of RCT for social interventions

Where Leão and Eyal’s article really gets interesting, though, is when they talk about the earlier push for RCTs, several decades ago:

While the buzz around RCTs certainly dates from the 2000s, the assumption —implicit in both the randomistas’ and their critics’ accounts, that the experimental approach is new to the field of international development— is wrong. In reality, we are witnessing now a second wave of RCTs in international development, while a first wave of experiments in family planning, public health, and education in developing countries began in the 1960s and ended by the early 1980s. In between the two periods, development programs were evaluated by other means.

Anyway, they now set up the stylized fact, the puzzle:

Instead of asking, “why are RCTs increasing now?” we ask, “why didn’t RCTs spread to the same extent in the 1970s, and why were they discontinued?” In other words, how we explain the success of the second wave must be consistent with how we explain the failure of the first.

Good question, illustrating an interesting interaction between historical facts and social science theorizing.

Leão and Eyal continue:

The comparison demonstrates that the recent widespread adoption of RCTs is not due to their inherent technical merits nor to rhetorical and organizational strategies. Instead, it reflects the ability of actors in the second wave to overcome the political resistance to randomized assignment, which has bedeviled the first wave, and to forge an enduring link between the fields of development aid and academic economics.

As they put it:

The problem common to both the first and second waves of RCTs was how to turn foreign aid into a “science” of development. Since foreign aid is about the allocation of scarce resources, the decisions of donors and policy-makers need to be legitimized.

They argue that a key aspect of the success of the second wave of RCTs was the connection to academic economics. Also, Leão and Eyal talk a lot about “nudges,” but I think the whole nudge thing is dead, and serious economists are way past that whole nudging thing.

Where next?

I think RCTs and causal inference in economics and political science and international development are moving in the right direction, in that there’s an increasing awareness of variation in treatment effects, and an increasing awareness that doing an RCT is not enough in itself.

Andrew Gelman
USA