With rigorous economic research and practical policy solutions, we focus on the issues and institutions that are critical to global development. Explore our core themes and topics to learn more about our work.
In timely and incisive analysis, our experts parse the latest development news and devise practical solutions to new and emerging challenges. Our events convene the top thinkers and doers in global development.
Legal empowerment of the poor, education, Africa, evaluating aid effectiveness
Justin Sandefur is a senior fellow at the Center for Global Development. Prior to joining CGD, he spent two years as an adviser to Tanzania's national statistics office and worked as a research officer at Oxford University's Centre for the Study of African Economies. His research focuses on a wide range of topics, including education, poverty reduction, legal reform, and democratic governance.
The United States is pushing to re-elect the World Bank’s twelfth consecutive American president. Does he deserve another term? Both lending growth and project performance at the Bank appear weak by historical standards, but evaluating a bank with no profit motive is inherently difficult.
If you talk privately to many World Bank staff members, you're likely to hear accusations that the Bank's current president, Jim Kim, is the worst the institution has ever had—which should be greeted with some skepticism given that list includes one who advocated a return to the gold standard (in 2010!) and another who staked his presidency on stamping out corruption before resigning amidst his own personal corruption scandal. What has Kim done that’s so bad by comparison?
The staff's key complaint is that Kim’s signature effort, a chaotic re-structuring of the Bank's organization led by external McKinsey consultants, has dragged on for four years now and is hindering lending operations. Meanwhile, Kim is staking his outsized media image on pet projects like the World Bank's ebola response which, while laudable, accounts for about 1 percent of new Bank lending since he took office.
Does the evidence support the allegation that the World Bank’s core work is really suffering?
The short answer is maybe, but it turns out to be a hard question to answer. We don't know any of the things we actually care about, like how many people were lifted out of poverty by Bank projects, how many infant deaths or tons of CO2 emissions were averted. The World Bank produces a corporate scorecard that tracks a dizzying array of indicators, but most are either entirely outside the Bank's control (like global GDP) or weren't measured until the last few years, so can't be used to compare Kim to his predecessors.
The best we can do over a fairly long time horizon is compare two key indicators of Bank performance: how much the Bank lends for development projects, and how well those projects perform—as assessed by the Bank's own Independent Evaluation Group. Think of them as the quantity and quality of lending.
Pushing money out the door is commonly ridiculed in the aid industry. But it's a considerably more informative metric when you're asking people to pay you back. The World Bank is not an aid agency per se, it's a bank. It has to convince borrowers that borrowing from the World Bank is worth the cost and the hassle. Supply must find demand, and the Bank has to offer a valuable service to increase lending.
Since the Bank has grown enormously over time, we compare growth rates rather than levels. Ranking World Bank presidents by annual lending increases shows that growth is at an all-time low—though as we'll see, that statement is a little unfair to Jim Kim.
At the top of the rankings is Robert McNamara, the longest serving Bank president, from 1968 to 1981, who grew lending by upwards of 20 percent per year for 13 years. The runner, Bob Zoellick, runs far behind at 12.5 percent lending growth sustained for five years from 2007 to 2012. Zoellick's lending boom is an artifact of the global financial crisis of 2008, which drove IBRD lending (the Bank's window for middle-income countries who pay interest and have to repay loans) up from about $13 billion in 2008 to $33 billion in 2009, before declining again in subsequent years.
Jim Kim had the misfortune of inheriting a Bank in 2012 that was shrinking after Zoellick’s lending boom. The stats indicate that he hasn't done as much to grow it back up again as some might have hoped. Though new IBRD lending did have a healthy year in 2015, rising from about $19 billion to $24 billion.
Not only was Kim’s timing bad, he also faced supply constraints. As with all presidents, the amount of concessional finance that goes through the Bank’s IDA window is capped by contributions. If you take out IDA though, lending growth was still the slowest ever under Kim. But the Bank faced supply constraints on IBRD lending in 2013 as well, due to prudential limits which (while possibly the president’s responsibility) meant slow lending growth didn’t necessarily reflect a lack of demand. How much to weight these caveats is a judgement call.
Low quality projects or low quality data
Money isn't everything, even in development finance. It also matters how you spend it. Lacking hard data on the impact of World Bank projects, we (and the Bank's board) are forced to rely on subjective assessments by the Bank's Independent Evaluation Group, which reviews all loans when they close.
In brief, while project quality isn't measured well, it doesn't seem to be going super well either.
A lot depends on how you slice and dice the evaluation data though. Most World Bank projects span multiple years, and many span multiple presidents. Should projects be attributed to the president who started them, or the president who finished them? We compute both metrics.
In either case McNamara looks pretty good. In contrast, Paul Wolfowitz’s ranking varies a lot. He appears to have closed out some decent projects, and started a crop of bad ones—though it should be noted, the year-to-year variance in these evaluation scores is not huge, not to mention that it's entirely unclear how much influence the World Bank president has over the quality of project execution.
Judging Jim Kim using project evaluations is even trickier. Scores are only available for 33 projects that started (and ended, triggering an evaluation) during Kim's tenure, compared to nearly 600 for his predecessor Bob Zoellick, and over 2,600 for the long-serving Robert McNamara. That means the projects are likely to be of a different type as well—dominated by what used to be called structural adjustment loans, designed to spur policy changes rather than to build stuff. Nevertheless, using this small sample of projects, Kim falls right in the middle of the pack on quality.
Looking at projects that have closed out since Kim took office, the sample is quite a bit bigger, and the picture is a bit bleaker. That yields a sample of 585 projects, for which Kim ranks dead last on the evaluation quality metric. Again though, this may be unfair to Kim: as our colleague Scott Morris pointed out to us, some of this may be due to poorly designed projects that were rushed through during the 2009-2012 lending boom and are finally getting evaluated under Kim’s tenure.
The re-election question
For an institution that evangelizes for evidence-based decision making, the debate over the selection of the World Bank's next president—and whether to re-elect Jim Kim—has been oddly devoid of hard data on performance.
Data isn't everything. As Lant Pritchett has argued, it might be a good idea on principle to elect a woman for a change. And Kim is the twelfth consecutive American to run the World Bank, which is increasingly hard to justify as the U.S. share of the global economy declines.
In short, if the United States wants people to treat Jim Kim's presumptive re-election as legitimate, it needs to lay out the case for him on merit. Even though the data is ropey, the onus is on the Bank's board, and especially the American delegation re-nominating Kim, to be explicit about what metrics they’re using to define success.
Disclosure: Justin is married to a World Bank economist. However, she disavows all views expressed here and would prefer he stop writing about the World Bank.
Note: All the calculations in this piece are based on World Bank data and reports. In some cases, we dug through decades of financial statements. Readers interested in replicating or modifying our rankings can download the full the data, including source documentation, and Stata code.
This is the last in our three-part blog series on results from our polling experiment with voters and policy elites in Tanzania. Read part I on fuel subsidies andpart II on cash transfers.
Even the most ardent defenders of democracy sometimes worry that populist pressure may lead to short-sighted (or populist) economic policy choices — especially in a low-income country in the midst of a resource boom. Tanzania is a good case study: It's poor, had an imperfect but somewhat competitive election last year, and recently discovered large reserves of natural gas. So after polling 2,000 ordinary Tanzanians in 2015 about their views on the use of expected natural gas revenue with our colleagues Nancy Birdsall, Jim Fishkin, Abel Kinyondo, and Mujobu Moyo, we decided to follow up with an experiment polling Tanzanian “elites,” to see whether they are aligned with citizens, or could be swayed by citizens’ views.
Elites are more patient, and more economically nationalist than voters
The results suggest a more complicated narrative than short-sighted voters versus prudent elites. Voters are more eager than elites to spend money now rather than save for the future (more favorable spending on services like health and education, not long-term capital investments). But it is elites that are also more tempted by proposals to use gas for fuel subsidies or relatively high-cost domestic energy projects over selling gas at global prices. And elites are more skeptical than voters of transparency and international oversight of gas revenues. On those topics, not only is the public arguably more prudent than elites, but (at least in our slightly artificial experimental setting) it appears (figures below) public opinion can actually sway elites in a more cautious, economically orthodox direction.
What do we mean by elites? Defining them, much less sampling them, is tricky, so some caution is warranted in interpreting the results. We drew from attendees at high-level conferences on gas policy in Dar es Salaam, and on emails to government ministries, parliamentary offices, think tanks, academics, and industry groups. We would not claim the 125 respondents we ended up with are perfectly representative of government officials or business leaders. Our sample is, however, arguably informative about a relevant group: the self-selected elites who show up to gas policy events at 5-star hotels and take an interest in the topic.
Based on that definition, here's a snapshot of elite versus popular opinion at baseline – i.e., before any of our experimental treatments on either sample.
Elites care what people think
Our earlier posts in this series documented a process called "deliberative polling," in which we collaborated with Stanford University's Jim Fishkin to poll a representative sample of 2,000 Tanzanians, invited poll respondents to collectively deliberate about gas policy issues at a 2-day event with expert panelists and moderators, and then re-polled them. The goal was to figure out what average Tanzanians would think about complex gas policy issues if they took the time to learn about and debate the issues.
Many experts in Tanzania and abroad were skeptical ordinary Tanzanians could or should meaningfully weigh in on gas policy. In addition to seeing how public opinion moved after deliberation, we wanted to find out whether elites care what informed citizens think. So we gave half of our elite sample a preview of informed public opinion—showing them the results of the "deliberative polling" exercise before asking them to offer their own opinions.
The results suggest elites are reasonably responsive to public opinion, at least in these controlled circumstances. (All of the effects shown in the graphs below are statistically significant at the 10% level or lower).
Result #1: Popular opinion swayed elites toward selling gas rather than using it domestically.
This is arguably a case of public opinion pushing elite opinion in the direction of economic orthodoxy.
Given the limited distribution network for electricity in Tanzania, there isn’t significant capacity to absorb a large amount of the gas domestically. As the IMF points out, commercialization would require huge investments in infrastructure and may be overpriced for the Tanzanian market, depending on global price fluctuations. Exporting the gas will ensure that Tanzania collects reliable income that can then be reinvested in domestic needs.
Result #2: Public opinion increased elite support for a spending boom.
Resource-rich, developed countries are often counseled to store a majority of their resource revenue in offshore accounts, such as sovereign wealth funds, that grow the pool of revenue through investments. (See this recent paper on whether or not Tanzania should create its own SWF). However, Tanzania’s needs are urgent, and an influx of revenue could mean filling budget gaps that would otherwise require international aid or borrowing that leads to more debt. (See Venables 2016).
Result #3: Poll numbers reinforced elite support for spending gas revenue on public services rather than long-term infrastructure investment.
Investment in infrastructure facilitates mobility, industry, schooling, health service delivery, and economic growth generally. Despite progress in infrastructure spending and subsequent economic growth, Tanzania would have to spend over $2 billion annually for a decade to meet its infrastructure goals.
Public needs for better education and health systems are also great and, in this case, may be perceived as more urgent by voters. Although education spending has increased significantly in recent years, learning outcomes have actually gotten worse. According to the 2010 DHS survey, health outcomes have shown progress, such as decreases in child and adult mortality as well as increased life expectancy, but continue to require large investments.
Result #4: Elite opinion was strongly against using gas as collateral for international borrowing, and that view was strengthened when presented with evidence of popular support.
Voters were narrowly opposed to using natural gas as collateral. In contrast, elites expressed a fairly strong preference not to use the gas as collateral, and when shown voters’ poll results, this opposition strengthened. Based on anecdotal conversations with the “elite” sample, we suspect opposition to borrowing is strongly related to Ghana's recent experience of converting an oil boom into a debt fiasco. In anticipation of a large windfall of oil revenue, the Ghanaian government decided to borrow heavily using the prospective revenue as collateral. As global oil prices plummeted, Ghana’s impending boom went bust, urging caution to policymakers in other newly resource-rich countries.
Result #5: Public opinion reinforced elites’ strong preference for government spending over direct distribution.
It’s surprising that elites and voters showed such strong resistance to direct distribution. Tanzania’s conditional cash transfer (CCT) social safety net program, TASAF (Tanzania Social Action Fund) has demonstrated remarkable gains in health and education. Other cash transfer models, such as Alaska’s, South Africa’s, and Mongolia’s (which we included information for as part of our deliberative poll briefing treatment to voters) have also had positive well-being and development outcomes. In a previous blog in this series, we explain several theories for this seemingly paradoxical outcome.
Result #6: Elite opinion shifted toward stronger transparency based on public preferences for international oversight.
Voters initially showed much stronger support for some form international oversight to manage revenues, while elites were much less skeptical of government management at baseline. Researchers often cite greater transparency as a bedrock combatant of the resource curse, assuming that more transparency – whether in the form of external budgetary review or publishing contracts (which voters and elites both supported quite adamantly) – will lead to stronger accountability between the public and policymakers. This will, in turn, likely decrease the risk of corruption and mismanagement. Groups, such as EITI and NRGI, have made this their calling card mission as a first step in responsible resource management.
Overall we were surprised by the how much elite opinion shifted in response to public poll numbers. That might imply elites are, in fact, responsive to the public, or that their initial views were weakly held or ill-informed. Alternatively, it might just mean that small cues in a survey experiment have exaggerated effects. Time will tell whether real events in Tanzanian politics resemble our results obtained under somewhat contrived experimental conditions.
Many economists would be disappointed to see how responsive elites are to voter demands for a spending boom, but they might be heartened by elites’ acquiescence to voter preferences to export gas and publish contracts. Democracy is complicated. What seems clear from these results is that neither “listening to voters” or “resisting populist pressures” is a comprehensive recipe for responsible governance of Tanzania’s natural gas reserves.
A sweeping reform proposal that made global headlines has evolved into a more modest pilot, designed to generate rigorous evidence.
Liberia's public schools are failing. After years of civil war followed by the 2014 Ebola epidemic, the Ministry of Education does not have the capacity to run the national school system. Teachers often aren’t paid, and as a result, often don’t show up. Many students don’t either. More than 60 percent of school-aged children in Liberia aren’t in school, placing Liberia in the lowest percentile of net enrollment rates in the world.
Not that going to school is a guarantee of much. Among adult women who reached fifth grade in Liberia, only 1 in 5 can read a single sentence. It’s hard to tell parents to keep their kids in school when they’re unlikely to even learn to read.
Something needs to change.
A radical proposal and a rocky reception
Against that backdrop, Liberia made international headlines earlier this year when the Ministry of Education announced a radical proposal to convert its failing primary schools into American-style charter schools.
These charter schools would be free, with no selective admissions, and staffed by qualified, unionized teachers on the government payroll.
The plan became a media sensation. "Liberia outsources entire education system to a private American firm," was the headline in South Africa's Mail and Guardian. The UN's Special Rapporteur for the right to education declared Liberia's plan "completely unacceptable" and "a blatant violation of Liberia's international obligations under the right to education." Legally, that’s highly dubious, but his comments reflected a growing tide of opinion.
So, over the next six months, the Ministry of Education went back to the drawing board and took counsel from local and international partners. What has emerged is a much more robust, and more modest plan, with stronger foundations for the future.
Beyond the sensationalist headlines, the Ministry responded to a variety of domestic concerns, which had a very different tenor. Rather than opposing the charter school plan, many local stakeholders wanted to be part of the programme. Local education operators wanted to be involved, senators were eager to see more counties included, and aid donors working in Liberia wanted to see a serious evaluation before committing to any funding to current or future plans.
“Partnership Schools for Liberia” will start with fewer schools but more partners
The program that launches in September will start small, with 90 schools in the 2016/17 academic year. That’s still an ambitious plan for a poor country with limited bandwidth in the public sector. But it gives the Ministry a chance to see what works, to keep closer oversight of the process and to iterate the policy before making any decisions about the future of the program.
While Partnership Schools will be smaller (in terms of schools) it will also be bigger (in terms of the number of players involved). No one organisation is getting monopoly rights over Liberia’s primary schools. This is a key lesson from similar models in other countries: the benefit of building a government regulated “market” of high performing school operators, who strive to raise educational standards by learning from and competing with each other to achieve better outcomes. So, after an open and competitive bidding process led by Education Minister George Werner and his team, the Liberian government has selected seven organizations to run its new charter schools—in addition to its original agreement with Bridge International Academies—including leading Liberian educational institutions like Stella Maris Polytechnic; large international non-profits like BRAC with a reputation for scaling up fast in hard-to-reach places; and private school chains like Rising Academies with experience running schools in neighboring Sierra Leone.
Putting public accountability into a public-private partnership
The public side of a public-private partnership is just as important as the private side. These are public schools and, ultimately, the Ministry of Education needs the capacity to act as the duty bearer for education: responsible for all decisions to commission, scale and indeed terminate operator contracts.
To help inform those decisions, Minister Werner requested partners to commission a rigorous external evaluation of the pilot. This was commissioned via an open tender and an independent selection committee, including USAID, UNICEF, the Ministry and expert advisors, with Innovations for Poverty Action (IPA) selected as the winning bidder. The evaluation will provide a rigorous, independent measure of the effectiveness, equity and sustainability of the Partnership Schools in delivering quality education to Liberian children. It is a randomized control trial, which will help ensure that differences between charter and regular government schools reflect the true impact of the program rather than any pre-existing differences. (See an FAQ document on the evaluation here.)
The evaluation will study the impact of the Partnership Schools program across a number of dimensions, including enrollment, attendance, learning outcomes, equity and parent perception and engagement. Evidence generated from the study is intended to inform the Ministry’s and donors’ policy decisions to continue or scale up the charter school model. It’ll also contribute to the international debate about the design and role of charter school-like policies in education systems.
Beyond the RCT, the Ministry’s broader task is to ensure that the school operators are provided with the conditions they need to deliver, while being held accountable by government for the measurable results that they achieve for children.
When the status quo is unacceptable, experimentation is an obligation
Charter schools remain controversial, even in rich countries like the U.S. where they are already widespread. Regardless of how you feel about charters in general though, one thing seems clear: in Liberia, the status quo is simply unacceptable. Tina Rosenberg put this eloquently in the New York Times a few months ago:
I’ve sent my children to New York City public elementary and middle schools that are not academically selective. Our elementary school has been active in the anti-charter movement.
Liberia is different. The project should have been envisioned sooner, and the process should have been fairer. But if experimentation is justified anywhere, it’s there. It’s hard to look at Liberia’s educational system and say: Do nothing new.
When classes open in September, Liberia is going to try something new. It will be an experiment, and like any good experiment, it should be judged on the empirical data it produces—data on whether these schools deliver better learning for Liberian children, both boys and girls, urban and rural. Until then, stay tuned.
Susannah Hares is executive director of Ark’s Education Partnerships Group, a UK non-profit advising the Liberian government on the design of the Partnership Schools for Liberia program.
Justin Sandefur is a senior fellow at the Center for Global Development, and a principal investigator on the external evaluation of the Partnership Schools for Liberia program.
Satellite data suggests poverty is falling faster than we thought, but it’s probably not reliable enough to trust for targeting social programs – at least not yet.
Recently, a journalist asked me to comment for a story on a new paper to be published in Science that applies machine learning to satellite images from Google to create cheap, rough-and-ready poverty measures for several African countries. Sounded cool. Even better: they trained their algorithm on a poverty survey in Tanzania that I helped design and launch as my first job out of grad school. I think that pretty much makes me an astronaut.
The problem with NASA’s night lights is that rural areas of very poor countries like Tanzania are just blank. There’s no light detectable from space whatsoever, so no ability to measure variation in poverty across space or changes over time.
What the Stanford team has shown is that daylight beats night lights. In short, they show how to extract information from daytime satellite images that provides a more accurate prediction of poverty than the night lights. This is a cool technique that I’m sure will (and should) get a lot of use among researchers, and will hopefully catalyze further refinements to the approach.
Figure 1. An example of the daytime satellite images and features extracted from urban areas, non-urban areas, water, and roads (left to right)
Unfortunately, beating the night lights data is still a pretty low bar since night lights can’t perceive any variation between the poorest villages and the richest villages in rural parts of much of the developing world.
New technology, old worries
For years, the World Bank has used a related technique to provide countries with “poverty maps” that cover localized areas beyond the reach of traditional surveys. The older World Bank technique uses population census data in lieu of Google satellite images to fill in areas where surveys are missing. But the conceptual challenges are similar, and non-trivial.
“The difficult and contentious issue with this work is the accuracy of these estimates, and indeed whether they are accurate enough to be useful at all. Ideally, the users of the maps, policymakers and statistical offices in the countries that use them, should be able to judge whether the maps are accurate enough for their purposes, some of which, like the allocation of poverty-relief funds, are extremely politically sensitive.
“What we are most concerned about is the possibility that the Bank is making very attractive poverty maps, whose precision is not known, but which come with statements of precision that we suspect may be seriously misleading.”
For balance, see the World Bank research department’s response here. But in any case, it’s worth asking if this old worry applies to the new satellite-based poverty maps as well.
The satellite data is probably good enough for academic purposes…
Why do we measure poverty in the first place? The Science paper lists two pretty good reasons: (i) to study the determinants of poverty, and (ii) to target social programs. My reading of the data is that satellite imagery is likely to be very useful for the first purpose, and not at all useful for the second purpose given its current accuracy at predicting poverty.
Here’s the scatter plot of actual household consumption (which is the basis for poverty measurement in most of Sub-Saharan Africa) with the satellite-based predictions on the vertical axis:
Figure 2. Predicted vs actual cluster-level assets.
(NB: The paper and the text use consumption as well as assets, but for copyright reasons we’re reproducing only images which appear in an earlier Stanford poster version. See Figure 3 in the Science paper.)
The Stanford team evaluates the success of their predictions based on the R-squared of these linear relationships. That’s probably quite relevant for researchers who want to use this data as a proxy for economic development. At the high end, they can explain 55 percent of the consumption variation in Tanzania, and at the low end, only 37 percent of the variation in Malawi. That sounds pretty good, but it’s probably too rosy a picture. For one, since part of the motivation for the method is to estimate poverty where surveys aren’t available, they report the R-squared when training the algorithm on one country and applying to another. When you take the Tanzania estimates to Uganda, R-squared falls to 0.38. When you train the algorithm in Malawi and apply it to Nigeria though, the R-squared falls to 0.19.
Furthermore, when the authors (reasonably, and very transparently) restrict themselves to the poorer parts of the distribution near the global poverty line, the R-squared for all countries combined falls down to about 0.1. That’s… not great. It turns out that daylight, like nightlights, are much better at predicting high levels of consumption than low levels.
…but probably not accurate enough for policy purposes
Imagine now that the Tanzanian government wants to use this data to target a social program. Poverty is a binary outcome, so rather than R-squared, it’s useful to think of this in terms of misclassification rates. As a crude estimate, I took the Tanzanian survey data the Science paper used, and added some white noise to create a consumption proxy with different levels of R-squared. With an R-squared of 0.55 as the Stanford team found in Tanzania, I estimate the government would get a cluster’s poverty status right about 75 percent of the time. (Note that we don’t know if this would work at the household level yet, as surveys don’t release household GPS data.) If the R-squared falls down to the levels the authors report near the poverty line though, it’s more like a 60 percent accuracy rate.
(Stata do-file here. As an aside, I suspect Jean et al have made a fairly innocuous mistake in the PPP conversions, underestimating poverty in Tanzania; the $1.90 poverty rate in Tanzania in 2012 according to Povcalnet is 46.6 percent; I count 39 clusters with average consumption below $1.90 in the NPS data, which looks like a lot more than the number of blue dots west of the red line in the scatter plot in Figure 3 of the Science paper.)
Anyway, if my calculations are even roughly right, the satellite-based poverty measure is a little more accurate than a coin toss, but not a lot. I imagine I’d be pretty upset if the IRS only delivered my tax refund 60-75 percent of the time, or my health insurance plan only processed 60-75 percent of my claims. Again, this kind of error is fine for lots of research purposes. But contrary to popular belief, policy applications often require much greater rigor than required for publishing in academic journals.
Measuring the returns to government capital is difficult because the services of government capital typically are provided free of charge. This implies that, unlike returns to private factors of production, returns to government capital cannot be inferred from observed factor payments.
Somewhere in a village in Nigeria, a young girl is sitting in school today, just like she does every day, packed onto a crowded wooden bench in a faded school uniform. She represents a victory in the global effort to get all children learning, and her presence will be recorded as progress in the global databases maintained by UNESCO and the World Bank.
There's just one catch. She's not learning anything. She tries to follow along, repeating in unison as her classmates regurgitate what her teacher reads from the blackboard. But after three years of this, she still can't really read. By the time she eventually learns the basics it'll be too little too late, and she'll drift away toward the end of primary school -- an adolescent entering the labor market with several years of schooling and roughly the education of my eight-year-old niece.
Don't bother fact-checking my story. You can't – mostly because it's hypothetical, but also because there are no facts to check. Nobody's keeping track of whether this girl or millions like her are actually learning anything. In theory, Nigeria's education ministry can tell you how many children are enrolled, and how many teachers and desks and bathrooms are in each school. But it can't tell you anything about whether anybody's learning anything. Nigeria is not alone, nor even particularly exceptional among low-income countries.
What gets measured gets managed, and for now, learning isn't.
Done well, standardized testing is an egalitarian enterprise
One of the most attractive features of standardized testing with a representative sample of children is that it forces school systems to confront the reality of children who would otherwise fall through the cracks. The poorest kids, the kids who are struggling, the ones in the worst schools in the most remote places, count just as much for the national ranking on standardized tests as the kids in model schools in rich neighborhoods.
In contrast, existing international standardized tests do roughly the opposite – they celebrate the success of the successful, and sweep most poor kids in most poor countries under the rug. Most children in the developing world are not included in the sampling frame of any of the well-known international learning assessments.
If you follow education news, you'll have heard of tests like PISA run by the OECD, which measure learning levels of fifteen-year-olds in 65 countries around the world, or TIMSS and PIRLS which do something very roughly equivalent for primary students. Every few years, PISA or TIMSS or PIRLS makes big headlines when they release rankings of countries. South Korea places reliably near the top, and China has carefully cultivated its public image by only letting affluent and fast-growing cities like Hong Kong and Shanghai participate in these tests.
The fanfare around international test scores is mostly about who's on top. Much less is written about who's on the bottom. The most obvious reason is because the kids on the bottom of the economic ladder didn't take the test.
Figure 1 shows coverage of these tests by country income level at three different points in the school cycle, roughly corresponding to the three points the UN has proposed measuring for its global goals. Most children are excluded from TIMSS, PIRLS, or PISA, and this is especially true in poorer countries. Even if you broaden the range of tests to include regional initiatives like LLECE in Latin America or PASEC and SACMEQ in Africa, most of the world's population is left out.
Where learning data is missing, developing-country NGOs are filling the gap
In 2004, an Indian NGO called Pratham launched an ambitious project to measure how many children could read and do basic arithmetic in each of India's 600-plus districts. Pratham assembled an army of over a hundred-thousand volunteers through a network of community-based organizations to go village to village and test half a million children in their homes. The model spread. The test, known as ASER, spawned parallel efforts in Pakistan, then in Kenya, Uganda, and Tanzania, and more recently in Mali, Senegal, Nigeria, and Mexico.
The political theory behind ASER was that change happens at the local level. Parents need to see that their children can't read, and district officials need to see that their own schools are failing. ASER's enormous size – allowing it to produce statistics at the district level – was designed to provide that granularity, and the local implementation and simple learning metrics were designed to communicate directly to parents and local officials.
After a decade of ASER testing in India though, scores have not improved. An evaluation of these "citizen-led assessments" commissioned by the Hewlett Foundation last year was careful to highlight the many ways ASER in India and Uwezo in East Africa have influenced the education debate, but revealed some frustration at the lack of progress.
For all the appeal of the bottom-up approach, perhaps the route to school reform is not through the village. The evidence on information campaigns that seek to stimulate local school accountability by publicizing test scores in the developing world is decidedly mixed. Political scientists Evan Lieberman, Dan Posner, and Lily Tsai ran an experiment in Kenya to test the ASER theory of change by disseminating the results of ASER's sister initiative, Uwezo, back to the villages that were tested, together with invocations to take action and get involved in your child's school. They found no impact on parental participation, which they attributed in part to parents' general satisfaction with current learning levels.
A cheap, low risk gamble
This grassroots approach contrasts sharply with the theory of political change espoused by many advocates of international standardized tests like PISA or PIRLS.
This theory posits that the impetus for policy change comes from the national level, not local schools. To be slightly less than charitable, the theory here is that nationalistic competition among elites drives countries to reform. For those elites, the league tables of PISA scores discussed in The Economist magazine matter more than ASER's community engagement in a remote village. Popular politics may still have a role to play, but those politics play out through a national conversation in the mass media, producing a collective realization that education is in crisis.
There is some evidence that tests like PISA have actually had this effect. Informed observers say that while Brazil (until very recently) saw its economy as an emerging global power, PISA scores revealed its students were still very much in the developing world, opening up a conversation about evidence-based education reforms. And Poland’s realization that its high school students were falling behind their German peers led to a full restructuring of the junior and senior high school curriculum.
Will this same political dynamic work in India or Nigeria, where the political institutions to translate bad test scores into accountability may be less developed? Nobody knows for sure, but it may be relatively cheap to find out.
The OECD estimates that it could expand the number of countries in the PISA test of fifteen-year-olds by about 30 countries per year, including low-income countries requiring significant technical assistance, for a price of about a million Euros each.
It's also worth noting these international tests are not nearly as onerous as the repetitive battery of tests that suburban parents in the U.S. complain about – they're done once every few years, and crucially, they focus only on a small sample of children in each country. (The odds of your child getting sampled are less than 1% in most countries.) The goal is to measure the performance of the school system as a whole, not a particular child, teacher, or school.
Getting down to specifics: start at the beginning
Setting politics aside, there is another area where homegrown NGO assessments have blazed a trail for international standardized tests to follow. Tests like ASER have focused on testing all children, age six and up, with an instrument that prioritizes very basic literacy and numeracy – implicitly shining a light at the bottom of the learning distribution rather than the top.
And if the world is going to measure learning, there's a good argument for starting at the beginning – ensuring that the littlest kids master the most basic skills.
Experts in early-childhood education, like Nobel-prize winner James Heckman, argue that the return to human capital investments is highest for the youngest children, and decreases as they get older. It's somewhat unfortunate then, that the biggest gap in international testing in Figure 1 is for the youngest kids. Almost nobody does comparable testing at early primary ages. So if the UN aims to measure global progress on basic literacy around grade two, it's going to be starting almost from scratch. Regional tests like PASEC and LLECE have recently inaugurated early-grade math and reading tests, but together they only cover about 12% of the world's children.
One practical challenge that’s particularly acute in developing countries is that kids of a given age are spread out across a wide range of grades – or not in school at all. As seen in Figure 2, the biggest share of Indian eight-year-olds are in third grade, while in Kenya they’re in second grade – and if you focus on just boys in Uganda, the biggest chunk are still in first grade. But in all cases, you have a sizeable share of eight-year-olds spread across four or five grades.
These differences in grade progression can make comparisons of learning by grade level very deceptive. The ASER and Uwezo-style tests, which are based on household samples of kids at all ages, provide a birds-eye view of this problem. Comparing India and Kenya at sixth grade gives the impression of much higher learning levels in Kenya (see Figure 3). But this is purely an illusion of different proclivities for promoting failing students. Comparing kids by age in the bottom panel of the figure shows no gap between countries whatsoever.
Advocates of grade-based tests are not oblivious to this reality. Their argument is that measuring learning levels by grade promotes accountability for schools and teachers, highlighting whether they’re keeping up with the curriculum. The counter-argument is that it’d be small consolation that Kenyan sixth-graders were keeping up with the curriculum (though they’re not) if most of the sixth-grade cohort was still back in fourth grade trying to grasp basic multiplication. Age-based testing can assess whether the system is not only reaching kids who are ready for sixth grade, but also ensuring that kids get there in the first place.
Beyond these sampling issues, there are some sound reasons why serious psychometricians balk at the simple literacy tests used by ASER and Uwezo, and even more sophisticated tools like the Early Grade Reading and Math Assessments (EGRA and EGMA) promoted by USAID. Ensuring comparability across languages and curricula is difficult. ASER and EGRA have a good comeback to this critique: normal standardized tests may be impossible in their contexts. Especially in the poorest countries where reading levels are low, you can't just herd second-graders into an empty classroom, give them a number 2 pencil and a bubble form, and expect them to fill in a multiple choice test.
But these are technical debates, subject to technical resolution. Political will is much harder to manufacture.
In an ideal scenario, the world will settle on an international testing regime that makes it possible to have national level debates – comparing Nigeria's performance to Ghana's as well as Malaysia's – while embracing NGO’s concern for the youngest and most disadvantaged learners. For the time being, international organizations have constructed a vast system of statistics to measure mastery of complex concepts among students in Boston and Shanghai, while making do with crude metrics of enrollment and textbooks in rural Nigeria and the slums of Dhaka. Illiteracy remains a mostly silent epidemic, and it seems unlikely that we’ll fix it before we bother to measure it. It’s time for global standardized testing.
This is one of a series of blog posts from “RISE"–the large-scale education systems research programme supported by the UK’s Department for International Development (DFID) and Australia’s Department of Foreign Affairs and Trade (DFAT). Experts from the Center for Global Development lead RISE’s research team.
The Washington Area Development Economics Symposium (WADES) is an annual research conference which highlights academic work from researchers at leading economics institutions in development economics in the Washington DC area. Researchers from George Washington University, University of Maryland, Georgetown University, Johns Hopkins University, University of Virginia, the World Bank, the International Food Policy Research Institute (IFPRI), American University, George Mason University, and the Center for Global Development are all participants in the symposium.
The research organization Aid Data has been getting a lot of attention in the aid world of late with its survey of recipient country policymakers and practitioners and their views of the utility, influence and helpfulness during reform of various aid agencies. Suggests the press release: “According to nearly 6,750 policymakers and practitioners, the development partners that have the most influence on policy priorities in their low-income and middle-income countries are not large Western donors like the United States or UK. Instead it is large multilateral institutions like the World Bank, the GAVI Alliance, and the Global Fund to Fight AIDS, Tuberculosis and Malaria." That conclusion is based on average worldwide agency scores from the survey.
The Aid Data survey was a much overdue and valuable exercise that should be very useful for donors trying to improve the impact of their policy advice. The ratings of recipient countries on how useful and helpful for reform they find donor advice alongside how much donors set the agenda should be a goldmine for aid researchers, especially when more data is released. But here’s where three fundamental methodological concerns arise:
In many cases, the rankings are based on a tiny sample of self-selected respondents, giving considerable weight to a handful of respondents from sectors and countries where agencies are relatively un-engaged.
Due to a quirk in the methodology, the top-ranked aid agencies were often the ones that survey respondents knew the least about, like LuxDev, Luxembourg’s development agency. And if you listen to what most respondents said about most aid agencies – i.e., they’d never worked with them or never heard of them – the picture of which donors are most influential looks a lot different.
The cost effectiveness calculations only reward aid agencies for average user satisfaction per dollar spent, with no consideration of how many users the agency reached.
Admittedly, finding a sample of policymakers from around the world is inherently difficult. The Aid Data questionnaire had a sampling frame of 54,990 and was received by 43,439 people. The number who replied was 6,731 (out of whom 3,400 were from host governments). With 126 recipient countries involved, that’s about 53 responses a country (including about 27 government employees). As you can imagine, not every recipient has experience with every agency. Indeed, all over the world, 2,715 (40 percent of respondents, 5 percent of sample frame) reported experience with the World Bank, 1,676 with DFID, 37 with Luxembourg and 17 with the GAVI Alliance. That last number is less than 0.3% of respondents or less than 0.04% of people who received the survey. Aid Data weights the global scores in two ways: first so that global scores are the average of average scores at the country level (imagine there were two responses in Brazil and three in Nigeria for a total of five worldwide. The two Brazilian responses would be averaged, the three Nigerian scores would be averaged, and Aid Data would report the average of the two country scores as its global score). Within each country, responses in each policy area are given the same weight even if the number of respondents is different – so the average of six health sector answers are treated the same as the average of two education sector answers. In effect, this increases the influence on rankings of scores given by respondents in sectors and countries where a donor is relatively (but not completely) unengaged as measured by recipient responses.
The fourteen respondents who ranked the utility of GAVI’s advice –out of the 17 that had experience with the organization—score it at a (country-sector weighted) average 4.038 on a five-point scale. That’s the highest average for any agency scored. The 1,468 respondents who had experience with the World Bank scored it at (a country-sector weighted) 3.697. The 18 who scored Luxembourg put it at 3.683, which puts it at sixth place just behind the World Bank in the Aid Data rankings. To take those three scores, it would be a brave statistician who compared responses from a question answered in (at most) one out of seven of their sample countries by a total of 0.03% of recipients with a question answered in every sample country by a total of 3.4% of recipients.
The AidData report doesn’t include confidence intervals for most of the estimates -- they decided not to report them so as to maximize readability and simplicity.
But roughly speaking, to get a 95% confidence of plus or minus 5% on the survey scores, you’d need 382 responses. That’s being generous and assuming the respondents are a genuinely unbiased cross section of the 54,990 people of interest for every scoring question used in the rankings -- contrary to lots of evidence in the report.
Another approach would be to use the non-answers as information that an agency is not useful or agenda setting. We asked the team at Aid Data and they said that non-replies don’t count as a declaration that the respondent thinks an agency is useless or has no impact on the agenda, because respondents were told only to grade agencies if they had worked with that agency. Still, if a respondent has no interaction with an aid agency, that agency clearly hasn’t been of much use or very helpful to the respondent (even if they might think it was helpful in general terms). And that no one reports experience with an agency surely speaks to its “agenda setting” potential (the Illuminati and the Bilderberg Group notwithstanding): it seems unlikely that the agency has had a considerable impact on the agenda of someone who has never worked with them in any way. So there might be some justification in coding non-responses as an answer equivalent to the bottom of the scale (i.e., not influential, not helpful, etc.).
In the graphs below, we re-rank donor agencies using the Aid Data survey, but treating ‘don’t know’ as zero (and shifting all the responses onto a 0-4 scale instead of 1-5).
Unfortunately, we don’t have access to the actual survey data, so we can’t apply (or undo) the survey weights Aid Data used. That’s because the terms of Aid Data’s human subjects approval from the College of William and Mary’s IRB prohibit them from sharing the respondent-level data and Aid Data hasn’t published the number of respondents per agency per country and sector that we would need to replicate (or undo) their weighting system. For the time being, they are only sharing data from the survey that is not already in the public domain with collaborators/co-authors.
So for the time being, we’re doing the best we can with the published results. Formally, our jerry-rigged approach will only produce the ‘right’ rankings if the response rate does not correlate with average scores within the country-sector across sectors and countries. We’d welcome the Aid Data team rerunning the exercise to produce the exact numbers using this different methodology.
Ranking donor agencies using AidData survey
(Assuming donors you don’t know aren’t helping you)
Source: Authors’ calculations based on AidData (2015) survey report.
NB: Among respondents, we score inability to evaluate a specific donor as 0 (‘almost never useful’).
We do not adjust for non-response to the survey as a whole.
Donor agencies are ranked on three dimensions: usefulness, helpfulness, and whether they’re setting the agenda. In each case, blue font denotes the top twenty performers on our alternative ranking, and red denotes the bottom twenty.
The web of crossing lines indicates that the rankings are very fragile to how we code non-response. In some cases, donor agencies that were at the very top of the AidData rankings – like Gavi, LuxDev, or the Embassy of Finland in the case of “usefulness” – fall all the way to the bottom twenty once we adjust how we treat non-response. To be clear, that’s not to suggest the policy advice of Gavi, LuxDev and the Embassy of Finland isn’t useful to those that work with them –just that the utility of their advice isn’t as widespread as (say) the World Bank’s.
The problem of low response rates looms particularly large when the report starts ranking on "value for money." There, scores on agenda setting are compared to agency budgets. A small budget and large ‘agenda setting’ score is considered evidence of punching above your weight. But if only seventeen people worldwide have any experience with your agency compared to 1,468 for another agency, even if those seventeen say your agency is good at setting the agenda is it really evidence of greater cost effectiveness? In most countries in the sample, simply no-one will have said anything about Luxembourg’s agenda-setting potential and yet it is ranked on a global scale next to the World Bank for which almost every country will have a score. Perhaps Luxembourg has a small budget and sets the agenda fantastically in three small island states while the World Bank has a large budget and does not do quite so well at setting the agenda for India, China, Brazil and Nigeria. Which one is better value for money? Again, using the approach of giving an agency a zero score for each non-response rather than throwing out those data points, we can re-calculate the ‘agenda setting per dollar’ of the top ten agencies (though note early caveats about weighting still apply):
For all of the reasons above, we’d be very cautious in suggesting our alternate results are ‘right’ –or even necessarily ‘righter.’ And other approaches (re-scoring answers on the underlying questionnaire scales, trying to account for bias introduced by country or type of respondent to produce ‘demographically representative’ scores, using different weights and so on) could dramatically change rankings again. But our results do suggest that the top-placed agencies on the Aid Data rankings might want to be modest about their position –it might be due to a lot of other factors than greater efficiency in turning aid resources into positive and impactful policy change.
So fear not, Greece. Despite your donor agencies scoring a lowly 1.69 based on nineteen answers to the usefulness of advice question, placing you in 57th and last place behind Libya’s donor agencies, perhaps you are still offering great value for money. And, to repeat, the Aid Data survey exercise remains extremely valuable in helping give pointers towards making advice even more effective –so here’s hoping they do it again, even bigger, and better!
How can poor countries beat the resource curse? CGD research fellow Justin Sandefur returns to the Podcast hotseat to update us on a project that posed this question to ordinary people in Tanzania. CGD teamed up with REPOA to bring hundreds of Tanzanians to Dar es Salaam to debate what to do with that country’s newly-discovered natural gas deposits. This week, Justin is back to share the project’s results.
Public-private partnerships (PPPs) in education that combine public finance to provide free or subsidized access to privately delivered education are expanding in many developing countries, either to increase access where government capacity is limited or to improve learning outcomes—often with limited evidence on their success. This panel brings together experts from the policy and research spheres to review what we know about the design of effective partnerships, the hazards to be avoided, and the frontiers for new research.
The story reads like a convoluted Hollywood pitch for the sequel to Johnny Depp's Rum Diary:
An idealistic American academic devises a plan to create jobs and reduce poverty by building a new city on a tract of unoccupied land in Honduras. On the heels of his popular TED talk, and a military coup in Honduras, he convinces President Pepe Lobo and the National Congress to enact legislation clearing the way for the world's first modern "Charter City" – where Korean investors and Central American workers can do business under Canadian-style governance.
Shocking plot twist: the land is not entirely unoccupied. A Garífuna political group known as the Fraternal Organization of Black Hondurans (OFRANEH), launches an online petition calling on the Honduran Supreme Court to strike down the law creating the charter city. The roots of the dispute go back at least to the 1990s when the Garífuna, descended from West African former-slaves and indigenous Central Americans, won official recognition for their collective land rights, which led to the harassment and alleged murder of Garífuna community leaders by the henchmen of real estate developers, in turn provoking an Amnesty International campaign.
The most famous of the developers, Randy Jorgensen, has not been personally linked to any violence, but made his fortune as “Canada’s Porn King”, and is now pushing a deal to rebrand Trujillo, which [little known fact] originally inspired the term “banana republic”, as the “Banana Coast”, making it a stopping point for Panamex cruise ships. He’s busily selling beach homes to bronzed Canadian retirees, despite, of course, the ongoing kinetic war between DEA agents and Central American drug cartels in the area. And it appears the new Charter City, with hints of Canadian backing, will land right on top of all this [see maps below], in what surely must be good news for the Porn King’s land values.
It's unclear how much of this is fiction. Parts of the story bear the hallmarks of a conspiracy theory -- juxtaposing seemingly unrelated events to create a suggestive, sinister impression. But it's also easy to see why many Garífuna are skeptical that this story has a happy ending for them – especially given the Honduran government’s track record in protecting their communal land rights..
"They didn't land on the charter city. The charter city landed on them."
While the Garifuna appear to have a legitimate grievance, the solution need not be to permanently forego any public ventures than intrude on private lands – a surefire recipe to stifle economic development.
Democratic governments that respect the rule of law routinely do public infrastructure projects on occupied land.
Suppose a government wants to build a new road -- or in this case a whole city -- that would cross existing farmland. Is it justified in taking the land? The textbook response for an economist would invoke something called the Kaldor-Hicks criterion: the public benefits of the road must be big enough to generate "potential compensation" for the private costs imposed on the farmers.
There are at least two complications that arise in applying the Kaldor-Hicks logic and imminent domain to the Garífuna case. The first is the collective nature of the land rights.
Second, without disparaging Honduras's legal institutions, it is also worth stressing the need to focus on how and whether potential compensation will materialize as actual compensation for Garífuna communities. The Thorne study also notes:
"Some patronatos have also engaged in illegal land sales to outsiders. Because of such sales, and because Honduran political and legal institutions are often ineffective and corrupt, nearly all Garífuna territories suffer from multiple ownership claims. This has made foreign investment in coastal tourism contentious and difficult to manage."
Finding a credible legal process that can adjudicate these competing claims is going to be a prerequisite for the Charter City’s legitimacy. Requiring the Charter City’s planners – who have tried hard to focus exclusively on blue-sky thinking and greenfield projects – to participate in fixing the land tenure system from within existing institutions.
Can the West save the Garifuna from the Porn King and the Charter City?
As charter cities seek to escape the sovereign control of their host nations, and struggle to establish internal mechanisms for voice and accountability, where will accountability come from? A couple of ideas are already in the works.
1. One of the charter city's most promising -- albeit peculiar -- milestones to date has been Mauritius's commitment to guarantee the legal framework of the new zone. Because Mauritius still uses the Judicial Committee of the Privy Council in Westminster as a final court of appeal, cases from the Honduran RED may ultimately be subject to the independent scrutiny of a UK court.
2. President Lobo appointed five international figures -- including Paul Romer and CGD's own Nancy Birdsall -- to a Transparency Commission that will oversee the workings of the RED. So far the Commission lacks legal powers to do much beyond speak out, but if and when it is legally constituted my guess is it would be deeply concerned with the issue of land rights for the Garifuna community.
Beyond these charter-city-specific mechanisms, our CGD colleague Kim Elliott, who specializes in U.S. trade with developing countries, pointed to a couple of legal and regulatory instruments that might be brought to bear on any American firm doing business in the RED.
If a company doing business in the U.S. colludes with the Honduran government to violate anyone's rights, they might be subject to a suit under Alien Torts Statute, similar to the case brought by the Ongoni people of Nigeria against Shell and Royal Dutch Petroleum. (Of course, signs in that link suggest the Supreme Court may overturn that case and limit corporate liability in human rights cases.)
Finally, the US-Central American Free Trade Agreement (CAFTA) may provide some leverage in other rights disputes, though it's probably more relevant to labor disputes than the Garífuna's land case. For instance, since the law establishing the charter city waives Honduras's minimum wage, Kim forecasts a high likelihood that U.S. labor unions will file a complaint under CAFTA.
In theory, Honduran law and international best-practice provide pathways to reconcile responsible land-use development with indigenous communities' rights. Actual historical practice in Honduras is worrying.
Honduran courts should be the first port of call for these disputes, but the international character of charter cities suggests a role for international accountability mechanisms as well. The Transparency Committee needs to step up. The TC should also clarify whether it is helping to implement the charter city project, or providing independent oversight and accountability.
Randy Jorgensen's porn-financed real estate developments and the historical injustices inflicted on coastal Garífuna communities are, at best, tangentially related to the new charter city. But these very real grievances help explain why the Garífuna are understandably nervous that their rights will not be protected going forward. These legitimate concerns can't be dismissed lightly.
Update (September 10, 2012): Honduras has signed a memorandum of understanding with potential investors for its charter-city project without the knowledge of the project's transparency commission, prompting Paul Romer to distance himself from the project. On September 7, 2012, the five members of the nascent transparency commission who were appointed, but never officially confirmed, to oversee Honduras’s charter-city project sent an open letter to President Porfirio Lobo Sosa asking him to not proceed with the official appointment of the commission. The signees, including CGD president Nancy Birdsall, noted that “conditions have not existed to permit the Transparency Commission to play the role envisioned for this ambitious and important project.”
This set includes the household survey data, standardized test score data, and the Stata files to replicate the results in CGD Working Paper 271, "Why Did Abolishing Fees Not Increase Public School Enrollment in Kenya?"
We just ran 23 million queries of the World Bank's website. Technically, a piece of computer code did the work, occupying a PC in an empty cubicle in our office for about 9 weeks, gradually sweeping up nearly every bit of information available in the World Bank’s global database on poverty and inequality, known as PovcalNet.
Why did we go through all this trouble? The parochial answer is that we wanted to use the data for our own research and got frustrated with the World Bank website designed to dole out the data in bite-size chunks, rather than the large swaths researchers might want. After a somewhat, erm, delicate negotiation with colleagues at the World Bank, we’ve just posted the resulting paper, data set, and code online, so data-oriented readers can now download the full income and consumption distributions from 952 surveys across 127 countries over 35 years in a convenient set of CSV files, rather than running repetitive queries of the PovcalNet web interface.
The more grandiose motivation for our 23 million web queries is that a serious public debate about global poverty and inequality goals is potentially unfolding, and serious public debate requires transparent public access to the underlying data in question. In his 2013 State of the Union speech, President Obama pledged US aid to reach a new target of zero extreme poverty within two decades, and the new World Bank president Jim Kim has made that zero poverty target the new overarching goal of World Bank policy. If US government spending and World Bank loans will hinge on these numbers, then independent researchers ought to be able to replicate the calculations, debate the many difficult and sometimes questionable judgment calls that World Bank staff make along the way, and possibly propose alternatives methods.
Here are three steps the World Bank should take to make global poverty data open to the public:
1. Embrace open data standards. The PovcalNet website is great for many users. But for researchers who would like to seriously kick the tires behind the Bank’s calculations, it locks the data in an unnecessary straightjacket. Give us freely accessible, machine-readable files.
2. Post the code. There’s already some dense documentation on the PovcalNet website, but it has gaps. For instance, the description of how the World Bank aggregates national poverty rates up to regional and global estimates seems sensible, but quite vague. We found it impossible to replicate this aggregation, even after asking for help.
3. Release enough micro data to recreate the estimates. For many countries, there’s nothing preventing the World Bank from posting the entire unit-record micro data set, properly anonymized. For countries that object, the Bank could still release grouped data sufficient to replicate their calculations more or less from scratch.
In a matter of days, the World Bank will release the new purchasing power parity data from the International Comparison Project, which are the price deflators underlying all cross-country comparisons of poverty and real GDP. Rumors are swirling that the new numbers will lead to some significant revisions of earlier poverty and GDP estimates. What better way to keep the World Bank above the fray than by taking an aggressive stance on full data transparency?
Until then, we hope our clunky solution here will make it slightly easier for independent researchers to delve into the public debate.
 Interestingly, we're not the first people to try this. After the fact, we learned that Sanjay Reddy of the New School for Social Research made a similar effort several years ago, but the World Bank server hosting the poverty data crashed before the process was completed.
 To be clear, no amount of web-scraping on our end can do these things. We accessed only publicly available information on the Bank website — we're not hackers who gained access to anything confidential. So we can help a bit with #1, but not #2 or #3. Also, most of the numbers PovcalNet publishes are only modeled results, not “raw” data. That makes our web scraping look quite silly in some cases — sort of like doing handwriting analysis of a typed page. We wish the World Bank didn’t use these modeled approximations. Fortunately, as we document in our paper, for over 30% of the country-years in the database, they don’t. So while our web-scraping still doesn’t access the original survey data, it does turn up new information each time even after millions of queries
A couple years ago, Alan Krueger, then chairman of President Obama’s Council of Economic Advisers, made a big splash by highlighting a relationship he christened “The Great Gatsby Curve.” Simply put, data from multiple OECD countries showed that high income inequality was associated with less economic mobility.
The Gatsby curve struck a nerve, I think, because it suggested rising inequality was a threat to the mythology of America as a land of opportunity, where hard work is all it takes to succeed. Miles Corak, whose research underpinned Krueger’s original graph, posed the political question bluntly:
Is inequality a good thing, reflecting the fruits of skill and ambition and offering a promise of possibility for the next generation? Or does it skew opportunity, crudely mirroring the power of privilege and place and reflecting unfair barriers to success regardless of talent?
The Gatsby curve hints at the latter, uglier hypothesis.
So much for the American dream. But what about the rest of the world? My undergraduate textbook on development economics offered a more Panglossian scenario, in which growth and technical progress in developing countries necessitated an initial rise (then fall) in inequality, creating Kuznet’s inverted-U shape. Does the Gatsby Curve relationship hold for developing countries as well?
Comparable measures of intergenerational mobility aren’t always available outside of rich countries. Krueger’s original graph had just 11 countries, which others have extended to over 20. To expand further, I decided to look at the intergenerational transmission of learning outcomes in lieu of income. My alternative measure of mobility (or the lack thereof) is the relationship between parental wealth and children’s scores on the PISA exam. That takes us up to a sample of 52 countries with PISA and Gini inequality data, including developing countries like Vietnam, Indonesia, Brazil, and Peru. For Stata users, the .do files to recreate these graphs are available here.
The PISA is a standardized test administered to 15-year olds in dozens of countries every three years, most recently in 2012. In an earlier post I showed the unsurprising result that rich kids do better on PISA, so much so that rich kids in poorer countries score just as well their counterparts in rich countries. The strength of that relationship between wealth and scores varies a lot though across countries.
If your household is one standard deviation richer than your neighbors in Brazil, you can expect to score about 0.3 standard deviations higher on the PISA. If you have the same wealth gap in Canada, you get basically no advantage in test scores.
Armed with these correlations, we can turn to the actual Gatsby curve. The results are striking.
The graphs above shows that countries with high inequality have very big gaps in test scores between rich and poor kids. The correlation between the Gini coefficient of income inequality (on the horizontal axis) and my measure of intergenerational immobility (i.e., how well parental wealth predicts test scores, on the vertical axis) is high (about 0.71 for reading and 0.75 for math) and highly statistically significant.
Bear in mind that the Gatsby curve implies the poor are doubly damned. The upward slope means not just that rich people use their wealth to buy better education for their kids, but that as inequality increases, families with the same wealth levels pass on more of that advantage to their children. So not only is the economic divide between rich and poor wider in Brazil than Canada, its effects are also stronger dollar for dollar.
The results here mirror those found by John Jerrim and Lindsey Macmillan at London’s Institute of Education. They find that the Gini coefficient predicts both the rich-poor gap in numeracy scores, as well as university graduation and other educational outcome indicators for a smaller set of richer countries (h/t Lee Crawfurd).
So what should we conclude from all this?
It’s not obvious that the Gatsby Curve is actionable. Krueger has strongly implied that he thinks causation runs from inequality to low mobility (see last line here). If a policymaker asks you how to fix inequality of opportunity for kids at school, and you tell her to “first, fix income inequality,” she could be forgiven for laughing out loud. In general, I have a mild allergy to social science questions where the independent variable is a complex macro phenomenon rather than a policy lever, like the literature showing conflict undermines economic performance (great, now all we need is world peace) or climate change contributes to civil conflict (excellent, just solve global warming).
Krueger is a bit more optimistic, arguing that by casting the issue in terms of lack of opportunity, the Gatsby curve has broadened the ideological support for addressing inequality.
The two key remaining questions now are: (1) What are the main mechanisms underlying the Great Gatsby Curve? (2) What policy actions can be taken to improve economic opportunities for children born in disadvantaged circumstances? Learning more about the former can help us to achieve the latter — which is, in the end, the most important goal of all.
Finding a Gatsby Curve in test score data at age 15 can help with the first question. It implies that education quality is not only a key mechanism for intergenerational mobility (we probably knew that), but one that apparently shuts down in more unequal societies like Chile or Brazil.
It’s less clear the Gatsby Curve or even the underlying data can help with the second question. I’m fairly skeptical of the cross-country literature that has attempted to correlate scores on PISA and other standardized tests with features of schooling systems around the world, to determine the best arrangements to promote learning.
(Depending on your ideological inclinations, though, it might be really tempting to look at the Chilean data point in the graph — with super high inequality and intergenerational transmission — and speculate about the negative side effects of Chile’s nationwide adoption of private school vouchers in the 1980s.)
Perhaps more depressing, the Gatsby-curve relationship in test score data is a reminder of how closely intertwined our notion of academic merit is with economic inequality. By age fifteen, the advantages of being born at the top of a very unequal society already start to look like just deserts. From the perspective of employers and university admissions committees, plutocracy looks like a meritocracy.
This was perhaps all F. Scott Fitzgerald wanted us to learn. “‘Whenever you feel like criticizing any one,’” the narrator recalls his father telling him in the book’s famous opening lines, “‘just remember that all the people in this world haven’t had the advantages that you’ve had.’ … Reserving judgments is a matter of infinite hope.”
For all of its attractions, the charter city model – and the Honduran plan in particular – still faces significant challenges before it can muster the kind of widespread support from potential partners it needs to succeed. What follows are three big questions we think supporters of the new Honduran city need to address, and two ways to do so.