Unboxing Evidence Based Medicine: September 2018

The reading of a scientific study should involve a domain beyond the scientific article, encompassing the ecosystem that involves the creation of the idea, definition of the protocol and acceptance of the results by the community. The reading of the work does not begin, nor does it end in the final article.

In a recent post, we provoked the reflection about the uncertain result of the SCOT-HEART clinical trial. That analysis was solely based on my reading of the article published in the NEJM. In the present article I will go further, beyond the final publication.

In the journal club of my cardiology department, we use a peculiar methodology to read articles. One of these aspects is the orientation for our resident to systematically access clinicaltrial.gov and look for inconsistencies between the protocol defined a priori and what is in the published article. We are constantly evaluating the ecosystem prior to the article.

That was when our chief resident, Dr. João Menezes, came up with another surprise about the SCOT-HEART trial: the primary outcome reported in the NEJM publication was actually one of many secondary outcomes, exemplifying "the magical transformation of a secondary into a primary outcome".

The transformation

The scientific integrity of a study depends on a priori definition of data analysis. This method serves to avoid the multiplicity of tests that would increase the probability of type I error. In this context, it is essential to define the primary outcome of the study, which should guide the conclusion, instead of relying on the positive secondary outcomes results that may suffer from the multiple comparison problem.

The publication of SCOT-HEART in the NEJM clearly states "The primary endpoint was death from coronary heart disease or nonfatal myocardial infarction at 5 years."

"Our pre-specified primary long-term endpoint was the proportion of patients who died from coronary heart disease or had a nonfatal myocardial infarction at 5 years."

Let us now go to the ecosystem prior to the article. As we know, authors should record the protocol of any clinical trial prior to its execution and this is usually done at clinicaltrials.gov.

Upon checking the study protocol on clinicaltrials.gov, João realized that the primary outcome described in NEJM was not a true primary outcome! As in a magic spell, a prior secondary outcome was made primary in the description of the final article.

In fact, this study was originally designed to evaluate the proportion of patients who received a diagnosis of coronary disease, comparing tomography versus control strategy. This proportion was the primary outcome pre-defined by the study.

The secondary outcomes were divided into 5 domains (symptoms, diagnosis, additional investigations, treatment implemented, long-term clinical outcomes). In the domain of clinical outcomes, 9 secondary endpoints have been described, among which is the "cardiovascular death and nonfatal infarction" end-point, now described as primary in the NEJM article.

Look at the description of the secondary clinical outcomes as set out in clinicaltrials.gov and the Trials article describing the study design in 2012:

Cardiovascular death or non-fatal Myocardial Infarction (MI) (ii) Cardiovascular death (iii) Non-fatal MI (iv) Cardiovascular death, non-fatal MI or non-fatal stroke (v) Non-fatal stroke (vi) All-cause death (vii) Coronary revascularisation; percutaneous coronary intervention or coronary artery bypass graft surgery (viii) Hospitalisation for chest pain including acute coronary syndromes and non-coronary chest pain (ix) Hospitalisation for cardiovascular disease including coronary artery disease, cerebrovascular disease and peripheral arterial disease.

To complicate matters further, clinical outcomes were pre-defined to be evaluated at 10-year follow-up and the article describes a 5-year follow-up. Thus, 5 years follow-up was not a priori definition. Strictly speaking, we are faced with a secondary outcome defined a posteriori (post-hoc analysis). This is not just semantics, because in the absence of a definition of when the outcome should be evaluated, we can test it year by year, hoping that chance presents us with a positive result at some point. At the moment the author is gifted by chance, he can prepare an abstract and submit to an important international congress. I'm not saying that's how it was done, I'm just showing what can be done with post-hoc endpoints.

In this way, we are facing a serious problem of multiple comparisons, which can be computed as follows:

Considering the 5% alpha, if the null hypothesis is true (tomography group = control group), the probability of a false positive result in a single primary outcome is 5%. However, we are making 9 secondary attempts to get a positive result. If each of these attempts has a 5% probability of a false positive result, the probability of a false positive result appearing in any of the outcomes is 1 - 0.95^K, where K is the number of trials. Thus, the likelihood of any of these secondary outcomes being false-positive is 36%. Much higher than the 5% if we were analyzing a single primary outcome.

To aggravate, the statistical power of SCOT-HEART, after correction for the actual incidence of the outcome, is only 27%, as we mentioned in previous post. We then have two mechanisms of randomly manufacturing a false-positive: multiple endpoints tested and a study that lacks statistical power. In this way, the probability of false-positive becomes greater than 36%. Third, if we consider the risk of outcome bias (ascertained by electronic medical records, not adjudicated), SCOT-HEART is a random and systematic machine for generating false results.

This is one more explanation for the unlikely relative hazard reduction of 41% in the incidence of the combined outcome of infarction and cardiovascular death at 5 years of follow-up after coronary CT. As discussed in the previous article, the prevention of a clinical outcome by conducting an examination depends on three conditional probabilities (abnormal finding P x change in treatment P x beneficial effect the changing treatment), different from the probability of benefit of a treatment that has only one component. To aggravate the gradient of changing treatment between the groups was only 4%.

In this way, it is too good to be true that conducting an examination promotes a benefit with the usual magnitude of good treatments, what usually ranges from 20% to 40%. Here we refer to the relative reduction because it describes the intrinsic "effect size" of a treatment, which does not vary with absolute risk. For example, the relative risk reduction of death for heart failure by enalapril or bata-blocker is 16% and 35%, respectively.

Two Scandals

It is scandalous for the authors to describe as primary outcome an outcome that was pre-defined as secondary. This shows the lack of scientific integrity behind the scenes of this work.

Perhaps more shocking is the acceptance of this article by the medical community, which seemed to commemorate the outcome of the work, featured prominently at the European Congress of Cardiology in Munich.

Problems of scientific integrity do not belong to a morally defective individual. Lack of scientific integrity stems from a faulty ecosystem, through research producers, editors and reviewers, and those who read the article without the necessary critical insight.

The Legion Bias

We might think it is very strange that thousands of cardiologists simultaneously attending the presentation of the study in ESC congress agreed with the doubtful result. Should the number of enthusiastic people be an evidence in favor of study truthfulness?

It is worth reminiscing to the observation of Swedish physician and statistician Hans Rosling, who became famous for his TED lectures, using dynamic statistical graphs to show how most people are wrong about important facts of life.

Rosling used to ask such questions to a legion of intellectuals: "How many children from low-income countries have basic education? 20%, 40%, or 60%?" The correct answer is 60%, but only 7% of intellectuals responded correctly. Most people chose 20%. Note that if we asked a monkey what the correct alternative would be, it would hit 33% of the time. Why do sapiens hit just 7%? The answer lies in our extreme bias. We tend to believe in the most significant result (most positive), whether we are talking about a risk factor or the beneficial effect of a treatment. Our mind has a tropism towards the highest possible contrast, making us choose the most extreme result.

It is a collective phenomenon, creating a legion of believers in the most significant result. The immense number of people thinking the same way, reinforces the belief of the legion participants. It's the legion bias.

The problem worsens when we are medical specialists, enthusiastic about our technological tools. This justifies the belief medical community deposited in small and biased studies of hypothermia post cardiac arrest and beta-blockers in non-cardiac surgery, which have become recommendations in guideline; or hormone replacement therapy from observational studies. The same is true for SCOT-HEART, which, when presented with glamor at the European Society of Cardiology Meeting, created its own legion of believers.

The Novelty, Positivism and Confirmation Biases

SCOT-HEART is the most recent study, so it appears as a novelty that promotes knowledge evolution. However, there was already another study published years earlier. This is the PROMISE study: a larger study (10,000 patients), a truly primary outcome defined a priori, with follow-up for evaluation of outcomes, adjudicated. That is, PROMISE is an immensely superior quality study to SCOT-HEART. And its result was negative.

Why, then, do we prefer to believe in the positive evidence of poor quality than in the negative evidence of good quality? Because our mind has a tropism for the positive (bias of positivism) and for the new (novelty bias). Then, we use the confirmation bias (we select positive evidence and disregard negative ones) to reinforce our belief.

By considering the cognitive biases of the biological mind, we do not need to be rude mentioning possible conflicts of interest that can also move the legions of believers.

The King Who Was Naked

It tells the story of Hans Christian Andersen (1937) that a very vain king ordered two tailors an unprecedented outfit, so original that no one had ever dressed the same. In the impossibility of realizing the king's desire, the tailors devised an imaginary costume, which they claimed to be invisible to the eyes of stupid people. The king himself, when he tried on his clothes, could not see it in the mirror, but pretended to see in order not to look stupid. In the same way, all people realized that the king was naked, but no one drew his attention for fear of being considered stupid. And so the king spent much of his reign naked, exposed to ridicule. The fear of looking stupid made people accept the unbelievable. In fact, many believed that they were seeing the clothes, because they wanted to believe that they were not stupid.

This story portrays the mechanism by which some myths persist in medicine.

One fine day, during an important parade in a public square, when a child saw the king passing by, he cried out: the king is naked! This child unmasked the charade created by the tailors, embarrassed the king, and especially the subjects who believed the lie or were ashamed to disagree.

Some interpret that it was the innocence of the child that allowed its observation. In fact, he was one of those half-malicious children. In this case, the difference between child and adult was the courage to acknowledge the truth and disagree with the legion of fanatics.

Let SCOT-HEART alert to the multiple biases that keep us from scientific integrity.

Unboxing Evidence Based Medicine

Friday, September 21, 2018

The Magical Transformation of a Secondary into a Primary Outcome

Vitamin C for Sepsis: a philosophical-scientific point of view