Stanford statisticians: Exploring uncertainty

May 7, 2020

Policymakers use predictive models of the COVID-19 pandemic to determine, for example, when it’s safe to reopen schools and businesses. Although using data to inform policy would seem to be without controversy, an ongoing debate within the statistics and science community has unfolded about what data samples to use to predict accurately the spread of the pandemic.

For example, on March 16, as the coronavirus pandemic began spreading, a British research team released an epidemiological model predicting that more than 500,000 people in the United Kingdom could die if the government took no action to control the virus. Ten days later, after U.K. officials announced new social restrictions, the researchers lowered the projected death count to about 20,000.

Revised forecasts are common in statistical modeling as new data come to light and new policies are implemented. But according to a recent Stanford study, virtually all epidemiological models have an Achilles heel that can result in unrealistic, misleading predictions.

At issue is a mathematical term called the basic reproductive number, or R₀ (pronounced “R-naught”), popularized in the 2011 film Contagion and often cited in news stories about COVID-19.

“R₀ is the key parameter used in almost all contagion models to measure the transmission potential of a disease,” said study co-author Susan Holmes, a professor of statistics and the John Henry Samter University Fellow in Undergraduate Education. “It’s the average number of people who will get a disease from an infected person in a population where everyone is susceptible.

For example, if R₀for measles is 25, then, on average, each case of measles would result in 25 new ones. Each newly infected person could then infect 25 other people, causing the disease to spread exponentially.

In the March 16 British model, researchers used China’s R₀ value of 2.4 to model future fatalities in the U.K. and the United States. Using that number, the model generated worst-case scenarios of 510,000 deaths in the U.K. and 2.2 million in the U.S.

“In most models, R₀ is a fixed number,” said study lead author Claire Donnat, a doctoral student in statistics. “These models assume that the virus will behave the same in all countries. This is a very coarse simplification, in part because the daily contact rate varies across countries because of different policies and social factors. Our goal was to quantify the impact of questioning that assumption by integrating heterogeneity into the model on predictive scenarios.”

For their study, Donnat and Holmes created a new model that replaces the fixed R₀ with a range of reproductive numbers tailored to specific locations.

“Instead of using a single reproductive number, our model considers a distribution of numbers for different regions based on population density and the number of confirmed cases,” Donnat said.

Using published data collected during the pandemic in February and March, the researchers were able to generate nuanced predictions for 18 countries, states and territories.

One example is New York. Instead of a fixed R₀ of 2.4, the new model based on early infection data produced a distribution of reproductive numbers exclusively for New York ranging from 4.1 to 5.1. The high values accurately mirrored the state’s high density and high infection rate. For Hong Kong, which has managed to control the pandemic, the model generated much lower reproductive numbers ranging from 0.9 to 1.5.

“People talk about the basic reproductive number as if it's a constant,” Holmes said. “But actually it's a variable that depends on a combination of factors, like the weather, population density and cultural behavior. Japan, for example, has a high population density, but people often wear masks and they know how to leave space. In Italy, people typically say hello by kissing both cheeks.”

The study found that using a range of reproductive numbers, instead of a fixed R₀, dramatically altered the predicted rate of hospitalizations and deaths.

“Assuming R₀ to be constant comes at a huge cost in terms of the accuracy of predictive scenarios,” Donnat said. “In particular, the variable model shows the potential existence of a wider scope of worst-case scenarios, which could be crucial for policymakers to make informed decisions.”

The researchers plan to explore variability at a more granular level to incorporate factors like age, weather and highly contagious “super-spreaders.” To encourage other statisticians to adopt this new approach, the authors have provided open access to their code and data at: https://github.com/donnate/heterogeneity_R0

“In a sense, our work is to put back the uncertainty in R₀and say it's very variable and depends on many things,” Holmes said.

Story by Mark Shwartz, photos by Linda A. Cicero and Andrew Brodhead

Stanford statisticians: Exploring uncertainty

Stanford statisticians work on different fronts to help understand and find solutions in the battle against COVID-19

Stanford statisticians: Finding new treatments