This is a rather belated review of the book Tarnished Gold: The Sickness of Evidence-based Medicine, by Steve Hickey and Hilary Roberts. It was published in 2011, and my attention was drawn to it when I was asked to comment on its claims, from the viewpoint of an experienced clinical research specialist. Otherwise, I don’t think I would have heard of it, as it hasn’t made any impact in my professional field. In the event, it took me very little time to see why that is the case. The review may appear to meander back and forth between concepts, but that’s because it follows the book, which does the same. So apologies for the repetition.
A shaky start
The introduction is peppered with frank errors, for example that very large clinical trials are incapable of identifying the causes of diseases. They are not designed to do that! Then, I came across the term `orthomolecular medicine’, which is megadose vitamin therapy by another name. This is the obsession which Linus Pauling developed in his later years, and is seen by most nutritionists as overt quackery. The foreword is written by perhaps the leading quack from the field. This man says that evidence based medicine claims to provide certainty about treatments. If he knew the tiniest thing about it he would know that there are no certainties in science or medicine.
Clearly Hickey and Roberts do not understand what evidence-based medicine (EBM) is, or how it is used. They witter on ad nauseam about the problems with epidemiological evidence, but they do not know that randomised controlled trials are not epidemiology. There is a long and tedious section about `proof’. These two can’t have spoken to any EBM researchers, or if they did they are ignoring what they heard. We do not talk about `proof’, all we do is obtain the best evidence we can, always understanding that it might be changed by later and better research. How facile and idiotic it is to say that you can’t select clothes for people by averaging their measurements! They are either being deliberately obtuse, or they really are stupid. EBM works by providing evidence of what is most likely to be effective treatment, and that information is taken into account when obtaining as much information as possible about the individual patient. But the authors get this wrong, by saying that EBM claims certainty. It does not. They say that the best information comes from the patient, but how on earth is the doctor going to select a treatment solely from the information they get from the patient? I have yet to see them answer that, or even ask the question. The patient is not going to tell them what antibiotic works for the infection they have. I laughed out loud at the bit about blood pressure. They seem to think that doctors are going to give the entire population antihypertensives! They claim that no `epidemiological’ trials have ever been shown to save lives. But they seem to be talking about large scale intervention trials (epidemiology is not interventional), and there is not a shadow of doubt that some of these have shown reductions in mortality. If they think large scale randomised controlled trials are useless, then they can forget about receiving any effective treatment when they get cancer. Mortality from cancer in young people has been halved in the last 40 years.
In bed with the usual suspects?
I realised that this book could not be taken seriously, when the authors exhorted the role of complementary and alternative medicine (CAM) practitioners. These are people who mainly lie to their patients about how the body works, and give treatments which have either not been tested, or (mostly) have been tested and do not work. For example, acupuncturists say the body is pervaded by a life force which travels along `meridians’ they have found on the body. Chiropractors say the life force travels along the spine. Can they both be right? No, they are both making it up.
There are no plausible proposals in the book for replacing EBM. The authors make the mistake that analogies are OK for explaining an argument, but they don’t necessarily prove the point. They like exposing what they think are fallacies, but fall victim to the false analogy fallacy. However I have now found out what drives their vitriolic attack. They had a spat with the US National Institutes of Health some years ago, about orthomolecular medicine (AKA megadose vitamin therapy, qv), a thoroughly discredited brand of quackery as we know. Hickey and Roberts display all the attributes of the `prophet crying in the wilderness’. This vitamins saga crops up repeatedly, so they are obviously smarting over it. I just came across a reference to Dr Devra Davis and her book The Secret History of the War on Cancer. I did a little research on Dr Davis – yes, she is another conspiracy theorist, who has been raking over the mobile phone and cancer canard.
Hickey and Roberts trot out a litany of EBM abuses, but do not realise that when people fail to follow the EBM model, they do not invalidate it. Time and again we are told that it’s wrong `arbitrarily’ to discard data. They do not seem to know that selection of studies for meta-analysis and systematic review (they don’t appear to know the difference) is very far from arbitrary, but based on quality. As information specialists they ought to know about signal to noise ratio. There is a vast amount of noise in medical data, because humans are complex and we can’t possibly know all that is going on inside them. By rejecting poor quality studies we reduce the noise and hence make the signal easier to spot. They present meta-analysis as a way of gerrymandering the data to get statistically significant results. In my experience it is just as likely to show the opposite, that there is nothing happening. In some fields there are several studies with unclear and conflicting results, and a meta-analysis can show that taken together there is no treatment effect. That’s what you get with homeopathy. Of course, the authors don’t like that sort of outcome, and make some barbed comments about sceptics.
While Hickey and Roberts appear to know how a sample size is calculated, they don’t understand how it is used in practice. A key input is the expected difference between treatments, which has to be justified by prior information. For a phase I study, that would come from animal studies. It might be modified by what is known about similar drugs to the one under study, eg we might have reason to believe that our drug is better than established treatment so we would expect a bigger difference. Now they are great fans of Bayesian statistics, and there is nothing wrong with that per se, but as with the conventional (frequentist) approach which I have just outlined, the Bayesian approach relies on what is called the prior probability. That has to be justified as well – you can’t pull a probability out of the air. The tragedy for these two people is that, if they did that for their beloved orthomolecular medicine, they would get a prior probability of zero as there is no evidence that it works.
I will concede that the authors are right about trials getting much bigger, and they are right about the common mistake of quoting relative risk instead of absolute risk. But the latter abuse is perpetrated much more by the lay media than by researchers. Journalists can write a story about a mortality reduction of 50%, but not about a fall from 1% to 0.5% (they are the same thing).
They say that EBM ignores case reports. No, no, no. Case reports and in particular case series are useful indicators of whether there is a phenomenon that should be formally investigated. Where there is doubt however, case reports on their own are of very little value.
They seem so obsessed with their own cleverness that they fail to spot the obvious. Do they not know that the BMJ paper about the lack of randomised controlled trials (RCTs) for parachutes was a seasonal joke? The point is that RCTs are only required when the outcome at issue is in doubt. Nobody needs an RCT for splinting broken legs. Similarly, they don’t realise that the website about the Daily Mail’s twin obsessions with things that cause cancer, and things that cure it, is a jibe against tabloid journalism and not against EBM.
An `ideal’ non-EBM study
There is a curious section about a `simple Bayesian study’ of diabetes drugs conducted by `Dr Carlos’, a primary care physician. The authors claim that using this approach, with a very small number of patients, the doctor will be able to predict the correct treatment for the next patient. I have taken advice from two statisticians, Dr Adam Jacobs and Dr Alastair Knight, and the pharmacologist Professor David Colquhoun who has written a statistics book. Here is our collective assessment.
The authors say nothing about what prior probability is used for the Bayesian calculation. They are testing the difference between drugs, so they should estimate the probability of there being a difference. However if they have no good reason to believe A is better than B (or vice versa), it would be a uniform (uninformative) distribution, ie prior probability of zero. In that case, Bayesian calculations give you exactly the same result as conventional statistics.
The authors call this an ideal Bayesian study and highlight the supposed benefits of this design over EBM, by which they presumably mean a frequentist approach. However the authors fail to recognise the fact that Bayesian study designs can be included in EBM assessments and meta-analyses of the like. Furthermore the statement that “Dr Carlos can immediately see the likely benefits..” highlights the authors’ disregard for statistical interpretation. They fail to explain how he can do this, and the fact is that he can’t. This approach is no better than the conventional one.
The point seems to be that the authors claim a Bayesian analysis has all sorts of advantages over a more traditional frequentist analysis. To be fair, they provide a graph of the probability distributions which is a reasonably good way of visualising the results. However, talking about the normal way of presenting the results as being “gobbledygook” seems unjustified. You could easily present these results as a risk ratio or a risk difference, with confidence interval. The risk difference could, if wished, be converted to the number needed to treat, which is a pretty intuitive way of presenting the results.
The bit which is most egregious nonsense is the statement “Bayesian statistics are not highly sensitive to the experimental conditions”. If you do a flawed study, then you’re going to get flawed results whether you analyse them in a frequentist or Bayesian way.
We are not trying to argue against Bayesian statistics. They certainly have their place, and in some places can provide a much more intuitive way of looking at things than frequentist statistics. We’d probably be using them a lot more were it not for the regulators’ reluctance to believe in them. But to pretend that it’s somehow something different from “EBM” is nonsense. It’s just a different way of analysing the results.
As stated above, their approach adds nothing to the conventional approach. You could construct distribution curves for the actual observed values of a properly designed study – ie with continuous data rather than this arbitrary dichotomous outcome they have used here (Dr Carlos’ categories of `responder’ and `non-responder’). That would give at least as much information as they have here.
They keep saying that data from large clinical trials can’t be used to predict the response of individual patients, and that this is an example of the ecological fallacy. Now that says that you can’t say draw inferences about an individual by deriving summary data from the population of which the individual is a member. Their example is that you can’t average shoe sizes and say that’s the size for anyone in the population. The first mistake here is that CTs don’t even try to say anything definitive about any patient, they simply estimate the probability of the response you might get. As people who go on at such length about probability distributions you would think they would know that. Next, they put together this tiny study by `Dr Carlos’ and say that from the responses seen in the patients so far, they can predict the response of the next patient. So they are deriving definitive data about the next patient from all the ones seen so far. Or, if they defend themselves by saying that they are only predicting the probability, then they are doing exactly the same as a normal CT, but more weakly.
The ecological fallacy is about making inferences about individuals from populations. You might, for example, say that the French drink a lot of wine and have a low incidence of heart disease, and conclude that wine therefore reduces the risk of heart disease. That’s fallacious (despite probably being true anyway!) because we don’t know whether the specific French individuals who live to a ripe old age with healthy hearts are the same ones who are drinking a lot of wine. However, in a clinical trial, the situation is completely different. In a randomised trial of X vs placebo, we know that the individuals in the placebo group got placebo. We don’t see how they could claim that clinical trials are subject to the ecological fallacy without talking bollocks.
And yes, you can’t predict with certainty what will happen to an individual patient based on conventional clinical trial data. But you equally can’t do so based on a Bayesian analysis. The only way you could make predictions for individual patients (and of course this is starting to happen as part of the trend towards “personalised medicine”) is to collect a lot of data on patient characteristics, biomarkers, etc, and use them to define which patients will respond to a specific treatment and which won’t. And then, of course, you still can’t predict with certainty, although you can certainly improve the odds.
Back to megadose vitamins
A bit later there is a long section on their favourite topic, megadose vitamin C. I will keep my comments about that short. The evidence they cite goes back to Dr Fred Klenner in the 1950s, who claimed to have treated polio successfully. The authors claim that the results were replicated, but all of this evidence appears to be case reports, ie it is anecdotal. There is a lot of bleating about how expensive it would be to run the huge study that would satisfy the EBM crowd, but my question is this. Why has nobody carried out the small cheap Bayesian study that Hickey and Roberts were extolling earlier in the book? There has been no attempt by any of the megadose vitamin proponents to do any randomised comparative study on any scale, however small. The entire argument relies on anecdotal evidence. Hickey and Roberts also argue for focussing on effects that are obvious and don’t require huge studies to detect small effects. They say the effects of megadose vitamin C are dramatic. So why no study designed on that basis? Do they protest too much?
A major theme is that EBM is selective, and that ALL the data must be used to make decisions. Yet at the end they recommend a kind of informed guesswork, heuristics, which exposes a further conflict in the authors’ minds. This they say is an effective way to make medical decisions, using minimal data. Well it can be – triage can be heuristic-based. But how does this fit with their obsession with accepting all data? It makes no sense.
Another repeated mistake is that Bayesian statistics are anathema to EBM. This is rubbish. They can be used in EBM, and are, but the regulators are conservative and it’s hard to educate them. But of course, while Hickey and Roberts think they understand Bayesian stats, they don’t really understand how to use them, as we saw from the `Dr Carlos’ trial.
The obsessive objection to large scale trials belies what actually happens in drug development. The pathway starts small, very small. After studies in healthy volunteers to get initial assessments of safety, pharmacokinetics etc, studies of typically 100 patients are carried out to test whether the drug actually has a clear effect. This is the sort of `sticks out like a sore thumb’ test that Hickey and Roberts recommend. They don’t appear to have noticed that it happens already, and has done for decades. After that stage, much larger studies are carried out to confirm that the effect is real. Even those need further support from studies which are more representative of the population seen in clinical practice. Quite often, the large studies fail to show anything useful – the reverse of the authors’ claim that big studies are designed to mislead us by detecting insignificant effects.
The authors include a huge number of references, but a great many are misquoted and misinterpreted, by my quick reckoning. Many references are to books and speculative review material by others, rather than to rigorous science.
These authors seem to be well trained and experienced in science, but they are not by any means the first to go off the rails and pursue madcap ideas. Isaac Newton spent more time on astrology, alchemy and religion than he did on physics and maths.