Monday, August 27, 2018

SCOT-HEART Trial: how to spot scientific fake news at a glance




It was just presented at the ESC Congress and simultaneously published in the NEJM a great example of scientific fake news, the SCOT-HEART Trial.

I use this example to show that the reading of an article starts before the traditional process. A pre-reading should bring us the critical spirit necessary for the reading process. During pre-reading we begin to develop a vision of the whole, as if we were looking at a city from the airplane window.

Then we'll land the plane and start reading to assess details.

The pre-reading of an article is composed of two questions: first, the hypothesis makes sense, should this study have been carried out? (pre-test probability of the idea = plausibility + previous studies); second, is the result too good to be true (effect size)?

In pre-reading process, we should avoid flooding the details head. We need only to identify the tested hypothesis and the main result. By reading just the conclusion of the article, we get this information which should be accompanied by a look at the line of results that presents the main numbers in order to get notion of effect size (it takes 30 seconds).

In the case of SCOT-HEART trial:

 "CTA in addition to standard care in patients with stable chest pain resulted in a significantly lower rate of death from coronary heart disease or nonfatal myocardial infarction at 5 years than standard care alone.

The 5-year rate of the primary end point was lower in the CTA group than in the standard care group (2.3% [48 patients] vs. 3.9% [81 patients]; hazard ratio, 0.59; 95% confidence interval [CI ], 0.41 to 0.84, P = 0.004)."

From these two sentences, we have noticed the hypothesis tested: the use of tomography in patients with stable thoracic pain reduces cardiovascular events. What is the pre-test probability of this idea?

There is some plausibility to the extent that anatomical information can modify the therapeutic behaviour of physicians and then modifies outcomes. Regarding previous evidence, the PROMISE study randomised 10,000 patients for tomography versus noninvasive evaluation and was negative for cardiovascular outcomes. The PROMISE control group is not exactly the same as SCOT-HEART, but indirectly the result of that study models negatively the pretest probability of the SCOT-HEART hypothesis. Therefore, I would say that the pre-test probability is low, but not zero, maintaining the justification for the study being performed.

Then comes the second question: is the effect size too good to be true? Note that the CT scan promoted a 41% relative hazard reduction. This magnitude of effect is typical of beneficial treatments. It is important to note that the effect size of a test will always be much less than that of a treatment, since in the first there are many more steps between intervention and outcome.

In the case of a clinical trial testing efficacy of a test, the following steps are necessary before the benefit occurs:

The examination is done on all patients - a portion of them has a result that may suggest to the physician to improve patient's treatment - in a sub-portion of these patients the physician actually enhances the treatment - a sub-sub-portion of patients benefits from treatment improve. Therefore, we should expect that the magnitude of the clinical effect of a test is much lower than that of a treatment.

In this way, we conclude that the SCOT-HEART result is too good to be true. It would be extraordinary for a test to promote such effect size. As Carl Sagan said, "extraordinary claims requires extraordinary evidence". Is the quality of this trial extraordinary?

Now let's read the article, looking for problems that justify such an unusual finding, 41% relative reduction of the hazard by performing an exam.

The first point that draws attention was the minimal difference in treatment modification promoted by the CT scan versus the control group. There was no difference in the revascularization procedure. Regarding preventive therapies such as statin or aspirin, the difference between the two groups was only 4% (19% versus 15%).

The number of patients in the CT scan group is 2,073 x 4% improvement in therapy = the CT scan group had an additional 83 patients with improved therapy in relation to control.

The number of events prevented in the CT group (relative to the control group) was 33.

Thus, drug enhancement of 83 patients prevented 33 clinical outcomes. If we were to evaluate the treatment that was performed at the end of the cascade, the NNT would be 2.5. Something unprecedented, that almost no real treatment is able to promote, nor a test.

This is a definitely false result.

The continuity of the reading will serve to understand the mechanisms that generated this false result.

"There were no trial-specific visits, and all follow-up information was obtained from data collected routinely by the Information and Statistics Division and the electronic Data Research and In- novation Service of the National Health Service (NHS) Scotland. These data include diagnostic codes from discharge records, which were classified according to the International Classification of Dis- eases, 10th Revision. There was no formal event adjudication, and end points were classified primarily on the basis of diagnostic codes."

So the outcomes were obtained through the electronic records review, through ICD and without adjucation by the authors. Second, the study was open and ascertainment bias can happen. For example, knowledge of a normal CT scan may influence the doctor who writes the ICD to interpret a symptom as innocent, while in another patient who is unaware of the anatomy, a symptom may prompt troponin measurement and subsequent diagnosis of nonfatal infarction. This is just a potential explanation, which serves as an example.

In fact, we are never able to open the black box of the exact mechanism that prevailed in generating a bias. However, it should be borne in mind that the combination of an open-label study with an inaccurate method of outcome measurement leads to a high risk of bias.

One of the techniques to explore the possibility of ascertainment bias is to compare the outcome of specific death (subject to scoring bias - subjectivity) with the result of death from any cause (immune to bias). Even though it is not a primary or statistically significant outcome, it is worth as exploratory analysis. It is interesting to note that the hazard ratio is 0.46 for cardiovascular death and 1.02 (null) for general death. In the absence of a substantial increase in non-cardiovascular death, this suggests that the study is especially subject to ascertainment bias for subjective outcomes.

In addition, the study presents a high risk of random error, since it is underpowered. In fact, the calculation of the sample was based on the premise of a 13% incidence of the outcome in the control group, but only 3.9% took place. By my calculation, it reduced a desired statistical power of 80% to 32%. As we know, small studies are more predisposed to false positive results because of their imprecision.

This imprecision not only increases the probability of type I error, but also incapacitates the study of measuring the size of the effect. That is, 41% relative reduction of the hazard presented a confidence interval ranging from 16% to 59%.

Finally, if we considered the information true, it would be worth an analysis of applicability. The hypothesis tested here is of a pragmatic nature. That is, an intervention is done at the beginning, and we expect the physician reacts in a way that benefits the patient. However, the protocol was designed to systematically influence physicians behaviour. 

"When there was evidence of nonobstructive (10 to 70%) cross-sectional luminal stenosis or obstructive coronary artery disease on the CTA, or when a patient had an ASSIGN score of 20 or higher, the attending clinician and primary care physician were prompted by the trial coordinating center to prescribe preventive therapies. "

This methodology reduces the external validity of the study, because we do not know if in the absence of this induction by the protocol of the study, doctors would act the same. If the benefit were true, in practice it would be of a smaller magnitude.

For studies of insufficient quality we should keep uncertainty in mind. But SCOT-HEART goes further: this is study is certainly false. A great example of fake scientific news.

3 comments:

  1. Dear Luis,

    While I agree with you most of the time and while I am grateful to you for your great service to the medical/scientific community, I wholeheartedly disagree with your assessment of the SCOT-HEART trial.

    To be specific regarding your points:

    1) Is the pre-test probability of the study hypothesis?
    As you pointed out, it is scientifically sound to hypothesize that identifying patients with coronary atherosclerotic disease which is not detected by CT but not by stress testing allows to implement preventive measures that lead to lower risk of myocardial infarction and CV death. As such, the pre-test probability is reasonably high to warrant testing. At the time of trial planning, there were no prior data directly applicable. The PROMISE study was conducted at the same time as SCOT-HEART and results were not available then. From today’s perspective, the results of PROMISE would further strengthen the case of SCOT-HEART since a trend of lower risk of MI was evident in PROMISE but the numbers were too small to be conclusive.

    2) Is the effect size plausible?

    The study found a 41% lower HR of CHD death or MI with the CT guided strategy. I agree that is a suspiciously large effect size and critical review is warranted. It should be noted that the endpoint was driven by myocardial infarctions while CHD deaths were very few. Next, we should examine if the observed event rates are consistent with what one would expect in the given population and interventions. Indeed, an observed MI rate of around 0.7%/year in the control group very much consistent with data from similar populations. The rate was 0.4%/y in PROMISE but consistent with SCOT-HEART when adjusting for the much lower population risk in PROMISE (less than 10% had obstructive CAD by cardiac catheterization).

    Importantly, there was a 25% lower risk of MI in the CT guided strategy in PROMISE—which did not receive much attention because the difference is not statistically significant. However, the results of PROMISE provide strong support for the plausibility of the SCOT-HEART results. Additional evidence in support of the SCOT-HEART results come from a very large European registry (encompassing almost 87,000 patients) which revealed a 29% hazard reduction for MI with a CT vs. stress testing guided management strategy in patients with stable chest pain (Jorgensen ME, JACC 2017).

    Therefore, we have evidence from two RCT and one large registry (encompassing together >100,000 patients in different countries and healthcare systems) demonstrating a reduction of MI with a CT guided approach vs. stress testing. That’s pretty compelling evidence that there indeed is an effect.

    Now, the effect size in SCOT-HEART is somewhat larger, however, than in PROMISE and in the European registry. There are three likely reasons: 1) the much longer follow up in SCOT-HEART allows for more cumulative events and, conversely, more time for prevention to work. 2) It is likely that the prompting of the providers to start preventative measures had an additional effect in SCOT-HEART--as you pointed out. However, I would not hold this against the trial or consider this a reason for limited external validation. On the contrary, to recommend preventive measures in the setting of coronary atherosclerotic disease is not only reasonable, it is consistent with the study hypothesis. Indeed, not to recommend prevention in this setting dilutes the effect of these trials. As such, SCOT-HEART did not over-perform, but PROMISE underperformed for this reason. 3) The population risk is substantially greater in SCOT-HEART compared to PROMISE as evident by the rates of obstructive CAD in these studies. PROMISE was criticized for including a very low risk population with short follow up where it is difficult to show differences between groups.

    The system does not allow to write long responses--I will post the second part of my responses in a separate comment.

    ReplyDelete
  2. Part II

    How can we explain such large effect size with rather small differences between groups in terms of preventive medications?

    As you suggested, the tables suggest a rather small percentage difference in the rates of medications among groupos. The problem here is that by averaging the number of patients on certain therapies, we neglect the fact that a substantial number of patients were taken off preventive measures as result of normal CTA results. Table S3 shows that 14% of patients in the CT guided arm received new preventive treatment compared to only 4% in control. At the same time, 4% of the CT arm patients were taken off preventive therapy vs. essentially zero in control. Therefore, follow up averages underestimate the difference in new preventive treatment folks received in the CT guided arm. Furthermore, the data does not reveal if the intensity of prevention/medication dosing changed as result of preventive efforts. It is conceivable that providers increased statin doses in patients already on statins whose CT revealed atherosclerotic disease.

    It is also important to recognize that the results are not due to one intervention, such as statin therapy, but of a whole array of "preventive measures" --some even not accounted for. In addition to statins—which have been proven to reduce MI and CV death in similar populations--patients were also started on antiplatelet therapy—which is also associated with reductions in MI and CV death. New interventions are typically tested in addition to standard of care and the added benefit is often small. Here, we have the curious situation that some patients did not get any directed prevention (in patients with normal stress test results) and others got a whole array of preventive measures (patients with coronary atherosclerotic disease by CT). It appears the cumulative effect is quite impressive.

    3) Are the results inflated by ascertainment bias?

    I agree what there is potential bias since no event adjudication occurred. However, I find it unlikely that there would be a substantial bias based on treatment assignment in these randomized groups. The fact that PROMISE and the European registry showed similar results with shorter follow up and lower risk populations suggests to me that while SCOT-HEART results may be on the upper range of the effect size, study results remain within plausibility.


    4) What is the external validity of SCOT-HEART?

    As outlined above, I do not believe that protocolizing preventive measures in response to evidence of coronary atherosclerotic disease reduces external validity of the trial. On the contrary, I believe this measure is not only reasonable but very appropriate and applicable. In other words, the same prescription should be given in clinical practice if similar results are to be expected.


    5) Conclusions

    I believe the fact that two RCT and one large registry (encompassing together >100,000 patients in different countries and healthcare systems) revealed a reduction of MI with a CT guided approach vs. stress testing is of major significance and should result in immediate change to our practice (as already done in the UK where they use CTA as first line test in patients with stable CP). We use approximately 10,000,000 stress tests in the US each year. Even if the effect size in clinical practice is smaller than in SCOT-HEART, e.g., a 30% risk reduction for MI when combining SCOT-HEART and PROMISE, we may prevent many thousands of myocardial infarctions each year by identifying symptomatic patients who have coronary atherosclerotic disease—not detectable by stress testing—who benefit from prevention. As such, I believe the SCOT-HEART trial ranks among the most important clinical trials in cardiology of the past years.

    Kind regards,
    Armin


    My disclosure: I am directing a Cardiac CT Lab. My main clinical and investigational focus, however, is CHD and I have been interested in CT solely because I believe it is currently the best tool to identify patients with CHD and guide their management.


    ReplyDelete
  3. SCOT-HEART 2

    The major issue surrounding SCOT-HEART is not the accuracy of the diagnostic method, but the implausible benefit of a diagnostic (non-therapeutic intervention) method is to reduce 41% of outcomes in a subgroup of patients (stable low- risk, whose annual mortality is around 3%), when interventions with ICP + BMT compared in a RCT (COURAGE) were not able to reduce combined outcomes of death, AMI, new interventions or hospitalizations for ACS, even with all therapy support (ACE inhibitors, statins, b-blocker and AAS) associated with PCI with management strategy in stable CAD. In addition to all this there are still serious methodological limitations. The method is good, very good for what it is proposed, but to attribute to it therapeutic efficacy and of that magnitude does not seem appropriate to me.

    SCOT-HEART is a clinical trial that tests the hypothesis that the use of CTA + strategy to guide therapy in stable CAD is superior to a conventional evaluation strategy to reduce cardiovascular death + fatal AMI.

    The study is an open-ended clinical trial with a breakdown of confidentiality (open study favors this methodological problem), which implies a high probability of measurement and performance biases, without an event adjudication committee, that were evaluated by ICD 10 in electronic medical record.

    Of the total of 4146 patients 81 in the conventional strategy group had the primary event and 48 in the CTA group, which gave a favorable result in favor of the CTA with HR 0.59, whose risk reduction in favor of CTA was 41% with p 0.004.

    The expectation was that 13% of events occurred in the control group and there was a reduction of 2.8% in favor of CTA with p <0.05 to have 80% power. However, an event rate of 3.9 occurred in the conventional group and 2.3 occurred in the CTA group, giving a reduction of 1.6%. In this way the study was "underpowred" to detect the difference between the strategies, due to an inadequate sample size. Thus, the p-value of 0.004 may not be true.

    If we use a Fragility Index to assess the value of p, which evaluates how many events are required to make the p-value non-significant through the Fisher's exact test. In the SCOT-HEART case, since there was no adjudication committee, if there was an error in the identification of cases of AMI in the conventional group and we added 10 diagnoses of AMI in this group or reduced 10 events in the CTA group, the p-value of this study would be> 0, 05. See that due to the methodological limitations of the study, there is a high probability of bias and of the study having a false value P.

    There are publications that have evaluated the number of scientific papers with p value <0.05 that is in fact true, it is estimated that only 10% of these are true. Thus, published works may have a false significant p value (only 10% are true and in 1000 publications only 100 will be true assuming only the statistical significance of <5%). Accepting false evidence as true (Type I error) implies a lot of harm to the patient, to the scientific community, and to medical reversal.

    ReplyDelete

Vitamin C for Sepsis: a philosophical-scientific point of view

The CITRIS-ALI trial was a negative trial recently published in the JAMA, which depicts a graphic figure with looks and numbers show a ...