John Ioannidis published a preprint estimating an infection fatality rate (IFR) for Covid-19 based on seroprevalence studies (on 19 May 2020). He concludes the IFR is "relatively low" – less than half a percent and maybe even as low as 0.02% – which, he says, "may be in the same ballpark as the IFR of influenza". He argues that it could be "made even lower" by protecting "high-risk individuals and settings" rather than broad population measures, and with that IFR, to do that is enough.

I don't think he's made a strong case for any of those 3 points. Here's why.


A. The study sample is biased


Although Ioannidis describes the scope of his analysis as population-based samples or those "that might approximate the general population", that's not reflected in the study design and results.

That's for 3 main reasons:

1. He included studies that he acknowledges elsewhere in the paper aren't population-based and don't "approximate the general population". The included studies he singles out as not being representative of the general population (blood donor studies and his own study in Santa Clara) are groups of people healthier and all-round at lower risk of death from Covid-19 than average. They account for 58.8% of the people in the total sample. I think there are even more included studies than that in the same boat, like a study among the students of a single high school in France and their parents, siblings, and teachers.

2. While he accepted the large group of blood donor studies, he made the design decision to exclude studies in healthcare workers. So at both the methodological level and the individual decision level, the study pool was skewed.

3. The search strategy was inadequate, and could have skewed his sample. The search strategy he reported wasn't complete (it didn't say how he narrowed it down to SARS-Cov-2/Covid-19 studies), and it wasn't particularly thorough in the terms he used, nor in the number of places he searched (PubMed and a subset database of PubMed called LitCovid, plus only 3 of the preprint servers). He did supplement this by asking some people with expertise if he had missed any eligible studies and the 3 preprint servers are major ones. However, PubMed doesn't cover all journals, particularly European journals for example, and that could influence finding studies from that hard-hit continent.

Secondly, those inclusion criteria (journal article or preprint) exclude studies reported in other forms of grey literature, like those of governments, universities, and research institutes. For example, the cut-off date for Ioannidis' searches was 12 May. The very next day a study was reported with a wide blaze of publicity on a Spanish government website (see the summary, the full study report in Spanish, and a detailed news report in English). Even if it had been published on the 11th instead of the 13th, it still would not have been eligible.

This one study from Spain dwarfs the study pool in Ioannidis' paper: there are just over 35,000 people in his preprint, and there were seroprevalence results for just over 60,000 people in the Spanish population study, with an IFR of 1.1%. There have been at least 2 others in populations with relatively high Covid-19 death rates since 12 May (a population study of 1,862 people in Luxembourg, and one of 789 blood donors in Milan). And there was the LA county study by the same group as the Santa Clara study, too. In just one week, this preprint became badly out of date.


B. Data methods appear to skew towards lower IFR


In a previous post where I assessed Ioannidis' calculations of Covid-19's case fatality rate in March, I pointed to the problems inherent in assuming all the deaths in a group had already occurred soon after infection. In that case – the Diamond Princess passengers and crew – it turned out that only half the deaths had occurred when he was writing. Other authors in March had taken this into account (it's called censoring), and their assessment turns out to have been roughly correct - whereas Ioannidis underestimated the fatality rate by about half.

According to Gideon Meyerowitz-Katz via Twitter, some of the studies in this new preprint have made the same mistake, and Ioannidis didn't take that account. (I haven't checked this.)

I didn't think it was at all clear what Ioannidis did with the data in this paper. For example, when he says what features he considered in adjusted analyses, he includes the category "other" (unspecified). (He includes a mixture of adjusted and unadjusted data.) He used data on deaths in the areas of the studies that did not themselves calculate IFR, but I couldn't find the source, so can't check its reliability – there is great variability in the quality of death reporting in real time for Covid-19, for example, in areas where there weren't enough tests available to test every person who may have died of the disease, and a positive test was required to assign the diagnosis.

There are a few ways data analyses could skew towards lower IFRs: I've already raised 2 ways of under-estimating the number of deaths, which would shrink the numerator. Another is to over-estimate the number of people who were infected (the seroprevalence part), which would inflate the denominator of the IFR. People grappling with this on Twitter have suggested that this happened, too (see for example Cesar Lopez and Joe Robert). In addition, Meyerowitz-Katz tweeted that IFRs were corrected downwards depends on what antibody test was used, and he queried the basis for that. (I haven't tried to verify any of these criticisms.) Note: Meyerowitz-Katz has published a preprint review of Covid-19 IFRs with colleague Lea Merone using a very different approach. (They conclude the IFR is over half a percent, and that was with a cut-off date for studies of 25 April.)

With a search cut-off date of the 12th and a sole author preprint posted a week later, there wasn't a lot of time or method for minimizing data error. A person with a pseudonymous Twitter account pointed out that he listed the population of Oise (where that French school is) as 6 million people, when it's less than 1. (And she seems to be right.)

No doubt a lot of bytes will be spilled on debating these points in the coming days. And no doubt I'll be adding links to this section. However, I decided not to dig further into it for 4 reasons: the Spanish study is so big, it makes any study of seroprevalence studies published before it dramatically out of date. It's the mirror image of the scenario I depict in this cartoon (explained here):



The second reason is that it's just too soon to get a good handle on the situation this way: the Spanish data can swing the picture so dramatically because it's such early days, and so many factors affect a mortality rate.

The third reason is that we already know the concentration of severely ill people in a short period of time in an outbreak that gets out of control can overwhelm a healthcare system. We're actually measuring what proportion of entire populations of places like New York City died now.

And the fourth reason is that mortality data, as it often does, doesn't do justice to the amount of health harm this disease is causing. The potential burden of longterm morbidity is looking daunting. In May in New York City, for example, 0.6% (or 6 people per 1,000) of the entire population had been sick enough with Covid-19 to need hospitalization – and most of them were under 65. This disease is doing a lot of damage to people's bodies and emotional wellbeing, and we're only at the beginning of understanding how much longterm suffering this might mean – and how many later deaths, too.

Focusing on mortality also skews our perspective about how at risk of being harmed younger people are. Ioannidis argues that "the majority of deaths in most of the hard hit European countries have happened in nursing homes". For that, he cites a newspaper article that doesn't (and can't) substantiate that. Even if after all the dust settles, the estimates in the article turn out to be right, it would only barely be a majority. He goes on to argue:

The average length of stay in a nursing home is slightly more than 2 years and people who die in nursing homes die in an median of 5 months [23] so it is likely that COVID-19 nursing home deaths may have happened in people with life expectancy of only a few months.

The study he is citing here relates to less than 2,000 deaths in nursing homes in the US between 1992 and 2006: that's relatively small, and very out of date. What is counting as a nursing home in the Covid-19 data isn't at all clear, and that's going to be critical to understand. According to the CDC, 43% of people in nursing homes in the US are there for short stays, and nearly 20% of them are under the age of 65. The experience of these short-stay residents isn't reflected at all in the study Ioannidis cites, and if that study still reflects US nursing homes, he's underestimating people's life expectancy. Among that group of short-stay residents, are people who aren't ready to go straight home after a hospital stay. A study of nearly 417,000 people from 2007 to 2009 found that 60% of them returned home. 


C. No valid comparison data offered for influenza mortality


This has been key to Ioannidis' argument, ever since March when he calculated only 10,000 Americans might die of Covid-19. In this preprint, he offers no reference for his comparison data to the influenza IFR: his data point for it is "0.1%, 0.2% in a bad year". But we don't know exactly how many people get infected with influenza viruses, just as we don't know how many people are infected with Covid-19. So the correct comparison here would be seroprevalence studies of seasonal influenza, with only deaths attributed to influenza in the numerator.

Ioannidis appears to be going on the basis of CDC's modeling estimate of what the mortality rate from symptomatic influenza might be – which is not even an IFR. And doing that, as Jeremy Faust has blogged, is comparing apples to oranges: there are, he wrote, only about 4 to 16 thousand deaths actually attributed to flu in the US in a flu season. (See also a published paper on this by Faust and Carlos del Rio.) As I'm writing this, the US is only days away from 100,000 deaths attributed to Covid-19, and it's still only early in this pandemic.


Can you shield just high-risk people enough to get a low mortality rate?


This argument is one Ioannidis ends his discussion with: because, as he puts it, Covid-19 mortality doesn't really affect the "non-elderly, non-debilitated", we could concentrate on protecting just those who are vulnerable. No one's shown how that would work, though. Healthcare workers are "non-elderly, non-debilitated", but they're vulnerable in an outbreak, as are the people working in care homes – and the people living with, or being in close contact with, any of them. Elderly people and those with the co-morbidities that make you vulnerable to a poor outcome – severe asthma, diabetes, heart disease etc – are everywhere too, living and/or working with/caring for the "non-elderly, non-debilitated".

At another point, he writes: "COVID-19 seems to affect predominantly the frail, the disadvantaged, and the marginalized – as shown by high rates of infectious burden in nursing homes, homeless shelters, prisons, meat processing plants, and the strong racial/ethnic inequalities against minorities in terms of the cumulative death risk". In the US, that's an awful lot of people – including a high proportion of workers so essential and so financially disadvantaged, they can't stay out of harm's way.

Ioannidis concludes that his message is good news, but I think that's a mirage. It doesn't square with what this virus achieved in Wuhan, in northern Italy, in Spain, in New York City, in England... There's no easy escape in this pandemic. Protect a whole community well enough, or keep tight containment of outbreaks in it, and deaths can be kept low. Fail to contain it, though, and this virus will take a heavy toll.

Postscript: There is now a sequel to this post, which looks at 2 other preprints published at roughly the same time, on the same question (including the one by Meyerowitz-Katz mentioned above).

Hilda Bastian

First posted on 20 May 2020,

Last updated 26 May 2020.


Update 22 May 2020: Corrected the CDC estimate to being the rate of death per symptomatic illness. I originally incorrectly called it an IFR. My thanks to @OnoNoKomachi1 for pointing this out.

Update 24 May 2020: Added link to 23 May sequel.

Update 26 May 2020: Added section on nursing homes. My thanks to Chad Loder for pointing out issues with this part of the preprint, including the study on 60% of people discharged from hospital into nursing homes returning home. 


Disclosures: I am 59 years old. A close family member who is one of the people I care about most in the world is young and immuno-suppressed, and 2 of the others in the same category are high risk for other reasons. I have written about Covid-19 at WIRED, and at my own blog at PLOS Blogs, Absolutely Maybe. I wrote a very critical post in rebuttal of Ioannidis' STAT News essay on Covid-19 here on my personal website on 18 March 2020, with a postscript on his estimation and discussion of the disease's case fatality rate on 21 March.


Find me on: