Do Data Reveal All?

Do Data Reveal All?

Competing trial regimens for the COVID vaccine reveal new perspectives on the philosophy of data analysis

The COVID-19 vaccines have finally been launched. After a frenzied race against time and market, a handful of pharma companies came up with three or four effective vaccine candidates on which the civilization is pinning its hope. The process was not without its share of controversies, and a lot has been circulated in the media on how several factors might have been compressed to meet near-impossible deadlines that were probably dictated by reasons guided by policymakers.

To dispel doubts and worries, all manufacturers have been highlighting data derived from trial findings in support of the efficacy rates of their candidate vaccines. There was a slew of such announcements in November 2020, as preparations for marketing the vaccines were going on in full swing.

Data is the only reliable index in any scientific research, and in drug trials their role is crucial.  While any data on the COVID-19 vaccines would be epoch making, the current trial regimens have revealed new perspectives on the philosophy of data analysis. And the way the efficacy numbers were presented in the press releases, can provide valuable lessons for managers for whom data-based decisions are part of the daily grind. Let us consider the top three insights gathered in the process.

  1. The size of the database may not have any bearing onanalytics

Pfizer and BioNTech announced in early November that their vaccine candidate had displayed over 90% effectiveness in the randomized controlled trials. As per the data released by the two firms, more than 43,000 volunteers from various backgrounds had taken part in the trials – which looked like a convincing figure covering a wide sample population. The rate of effectiveness was also remarkable, because WHO guidelines requires just 50% success rate for a vaccine to be termed effective. Thus, more than 90% effective on 43,000 people looks a piece of data that indicates great success. However, data interpretation is really not that straightforward as our discussion would reveal.

Let us consider how the efficacy percentage was calculated. Following were the steps involved:

  1. Countinghow many in the group actually vaccinated contracted the COVID-19 infection.
  2. Counting how many in the placebo group contracted the COVID-19 infection.
  3. Dividing number A by number B.
  4. Subtracting the quotient from 1. The result is the efficacy rate.

In the Pfizer-BioNTech trials, 8 in vaccinated people contracted the infection compared to 86 under placebo. Thus, 8/86 = 0.093 — which, subtracted from 1, comes to 0.907 or 90.7%. That 0.7 stands for the “more” part in “more than 90% effective”.

Let us not forget that in the above calculation the total sample size – however large that might be – had no bearing on the efficacy rate. It was all about the number of persons who actually got infected. In this study, we were dealing with a total of 94 (8 + 86) confirmed cases among 43,000 people. The efficacy of the vaccine would have turned out exactly the same if the number of infected people remained the same – and distributed likewise among the vaccinated and placebo groups – in a total sample size of just 200 people!

Of course, the findings still hold good. A ratio of 8/86 in a randomized control trial is nearly impossible merely due to chance factors – had definitely been achieved through the vaccine. We just intend to highlight the fact that a huge sample size may look and sound reassuring, but in reality, it may not have any mathematical relevance to the outcome.

  1. Exact numbers may not lend extra credence to data

Into the second week of November, it was the turn for the Moscow-based Russian manufacturer of Sputnik V vaccine to release their trial results. The Gamaleya National Research Centre for Epidemiology andMicrobiology announcedan efficacy rate of 92% in a trial involving 40,000volunteers.

Close on its heels, US pharmaceutical major Modernadeclared trial results for their vaccine candidate too – it was 94.5% efficient in a trial involving a sample size of over 30,000 people.

In both these announcements, the point to be noted is the exactitude of expression: 92% and 94.5% – not over 90 or above 94 – unlike “more than 90%” as in the Pfizer-BioNTech announcement. It is a general human tendency to assume that anything that is precise and exact is closer to “the truth” or “the ultimate”. Thus, announcements with precise percentages might lead greater credence and reliability, whether or not that may be the case.

While this in no way undermines the scientific achievements of Gamaleya or Moderna, this goes to confirm that “how” numbers are presented can flavour the message that is being communicated. In business communications, and especially for promotional material, this strategy is routinely employed. However, precision is not always synonymous to perfection – but it can very well serve as a tool for persuasion. So much so that in our present example, a Belgian newspaper De Standaardwent ahead towrite that “the candidate vaccine of the American biotech company Modernaworks even better than that of Pfizer” – although no such claim had ever been made by the company itself.

Too much precision is definitely overwhelming; they impress and dazzle. That can influence the human ability to interpret data in its correct perspective, which is crucial in making informed decisions. A far balanced approach while dealing with estimates, is to make a clear distinction between precisely reported numbers andhigh-quality data – judging both on merit and in context.

  1. Better not to work your way backwards while analysing data

In the last week of November 2020, AstraZeneca came forward with findings from their own vaccine trials. Their sample size involved more than 11,000 people and the efficacy rate of their vaccine was announced to be 70%. This might surprise us because it sounds too low when compared to the previous figures. However, the AstraZeneca study used two different dosing regimens, out of which the half-dose regimen administered to a subset of 2,741 participants, proved to be 90% effective – nearly at par with the other candidates we had mentioned.

But does it? AstraZeneca later confirmed that the 90% estimate was on the basis of 33 reported cases of infection in the half-dose group, in which 3 got the actual vaccine and the other 30 were on placebo. Overall, the AstraZeneca trial reported 131 confirmed infections – leading to the 70% efficacy.

Further revelations emerged, where it turned out that the half-dose regimen was not intentional but rather an inadvertent mistake by a participating partner. Moreover, AstraZeneca admitted that they had combined together results from two differently designed trials in two geographies – the UK and Brazil – which, again is never a standard practice. It appears that the company had tried their best to proceed with the analysis despite the bloopers and salvage the situation.

This situation is a perfect example of a dilemma which data analytics are often faced with. They can either formulate a hypothesis and then gather data and analyse it to test whether the initial hypothesis holds through, or they can first collect whatever data is available and then go on to structure a hypothesis based on the analysis findings. For scientific experiments, the first approach is more-or-less universally followed. The second approach can radically increase chances of false positives, leading to erroneous conclusions. This later approach is more tempting, though, but leaves the door wide open to chance and will not hold ground against the time-tested former approach.

© 2024 Praxis. All rights reserved. | Privacy Policy
   Contact Us
Praxis Tech School
PGP in Data Science