The CITRIS-ALI trial was a negative trial recently published in the JAMA, which depicts a graphic figure with looks and numbers show a striking 45% relative reduction in mortality in patients with sepsis or ARDS treated with vitamin C. The study had 55,000 accesses and Altimetrics of 494, rending excitement and positive tweets.
In this text, I will not waste readers time discussing methodological flaws of “positive” secondary analysis of a primarily negative trial: in CITRIS-ALI the primary outcomes were surrogates of clinical outcomes, and mortality was just one of the 46 secondary endpoints tested; the difference was not exactly statistically significant, if properly adjusted for multiple comparisons (P = 0.03 x 46 comparisons). But this post does not lend itself to the obvious. These things do not make the CITRIS-ALI study unusual. In fact, it is just “one of those”.
In this text, I intend to discuss non-intuitive concepts which requires careful explanation. After all, scientific thinking is not exactly intuitive.
Why should a surrogate study not have death as a secondary outcome?
First, we need to review the real scientific purpose of secondary outcomes: refining knowledge about the primary outcome (positive or negative). Thus, the secondary outcome is, in its origin, subordinated to the primary outcome.
Secondary outcomes, if properly applied, explain the primary outcome. In this process, we start from primary results (most important) and then look at the secondary (less important). Let's go to examples.
If death is a positive primary outcome, it is interesting to know the mechanisms of mortality reduction. In this case, the secondary analysis of cause-specific death gains importance. Death is actually a net outcome, combined from multiple types. So it is interesting to know which type of death mostly contributed to the final result. Similarly, when there is no mortality reduction, it is importante to understand if it occurred because treatment did not impact anything, or because it reduced one type of death and increased another type of death (complication from treatment).
The more relevant an outcome is, the closer to the final pathway it is. Therefore, intermediate ways of explanation tend to be less important outcomes. So the nature of an explanatory outcome (secondary) is to be less important than the primary outcome.
When a study defines a final pathway outcome (death) as a secondary endpoint, it inverts the logic and create bias: the secondary outcome tends to invade the protagonism of the primary outcome. Death in the secondary position does not serve the explanatory purpose, but tends to "steal the scene" if positive.
The phenomenon of outcome interpretation bias takes place when a positive secondary outcome soften the negativity of the study. In the case of death, the effect goes beyond softening, it nudges our thinking towards positivity. We can’t help. And it gets worse when we realize the nudge is based on a flawed analysis of low positive predictive value, because of underpower and multiple comparison.
Therefore, death should not be listed as secondary endpoint in studies of surrogate or intermediate outcomes. If done, we run the risk of finding a probably false but very exciting result. So exciting it justified a graph to be built and spread in tweets, generating visibility, enthusiasm and quotations.
Why should pretest probability not be influenced by the logic of potential mechanisms?
Based on potential mechanisms of beneficial effect, some has considered the CITRIS-ALI hypothesis worth to be tested. Well, it is not. And I will explain.
It is known since the dawn of medical-scientific thinking that biological plausibility is not the same as probability of truth. In fact, it is one of the basic principles of evidence-based medicine. Null treatments often have several plausible mechanisms of functioning stated in the elegant introductions of clinical trials. Therefore, it is not the existence of possible mechanisms that make a phenomenon more or less likely to be truth. Daniel Kahneman described “confidence by coherence”, an illusion that mistakes coherence with truth. It is in the core of the belief that a fancy theoretical mechanism justifies a study.
So what is the point of winding up mechanisms? Actually, it serves the emergence of the idea. But between thinking about the idea and deciding to do a study, one needs to move to a real estimation of pretest probability.
How to estimate pretest probability?
Here I am not taking about Bayesian statistical analysis. I am talking about Bayesian reasoning, a nature way of human thinking.
Estimating pretest probability of a hypothesis looks like clinical reasoning. When faced with a clinical picture, it is not representative heuristics that best estimates pretest probability. In fact, it is best to use epidemiological data that does not concern the clinical picture itself. That is, the prevalence of the disease in a given circumstance is the true pretest probability, not the clinical similarity (“confidence by coherence”).
The same is true when we think of a study. The pretest probability is not how logic the idea seems. Probability rises from epidemiology. In the field of vitamins, studies are consistently negative, even when high expectations had been placed on vitamins preventing cancer or cardiovascular disease. It is the epidemiology of vitamins as potential treatment. It indicated low probability of a true effect.
We must recognize: a vitamine reducing mortality of a serious disease like sepsis by half is just “too good to be truth”.
There is also a second component of pretest probability: previous studies that tested the same specific hypothesis. The so called exploratory studies.
But, why “exploratory studies” are banalized?
The banalization arises from the confusion between a bad study and an exploratory study. Often a study with high risk of bias or random error shows an interesting result. Hence, enthusiasts call it exploratory or hypothesis generators. But low predictive value studies generate nothing but enthusiasm. They do not generate hypothesis!
Exploratory studies should be made of good empirical observations, with risk of bias or chance low enough to shape the likelihood of a hypothesis being true; but insufficient to confirm the hypothesis.
Therefore, this study of vitamin C does not generate any hypothesis of mortality reduction to be tested by future studies. In spite of a result that generated enthusiasm, but it is not capable of shaping the likelihood of the effect to be true. It is just a reinforcement of the belief that will justify future studies to test what did not need to be tested in the first place.
But why doesn't it need to be tested by future studies?
Because it would waste our (mental, time and financial) resources for future studies. If the future study was negative, we already knew it; if positive, it would only raise the probability from very low to low. For confirmation, it would take several positive studies to prove an unlikely hypothesis.
Yes, it is true that there are discoveries that emerge as black swans, unpredictabilities that changes the world. But these, unpredictable as they are, are not hypotheses previously fabricated and tested by poor quality studies. They arise at random, such as the discovery of penicillin.
Administering vitamin C may not cause much harm to the patient, but it does harm the collective cognition when belief replaces rationality.
Faith is not bad, it is actually innate of the sapiens species. But the value of faith relies in religious, espiritual or personal situations. We should not let the personal value of beliefs invade the scientific concept, nor the professional concept. Medical professionalism lies in respecting the order of things, based on evidence.
This text is not about vitamin C. Vitamin C was just a hook to spark a much more important discussion than its "effectiveness" in sepsis.
This text is about a scientific ecosystem of low probability hypotheses, tested by studies with severe methodological limitations, which use spin, outcome reporting bias and publication bias to generate positive studies of low predictive value to promote guideline recommendations, which will suffer “medical reversal” years later.
The question is whether we want to continue this fantasy or take a truly scientific professional atitude. Science is not looking for news, it is the humble attempt to identify how the world works and find real solutions to our problems.
As the late Douglas Altman once said, "we need less research, better research and research done for the right reasons."