Clinical trials are typically analyzed within a hypothesis testing framework. We ask whether a treatment is significantly better than control or, for an equivalence trial, whether we can exclude an important difference between two treatments. Usually, strong evidence is required to reject the null hypothesis. This is generally considered a good thing. For instance, we do not want to use a drug, and expose many thousands of patients to possible side-effects, unless we have good reason to believe that it will be of benefit. Yet the philosophical conservatism of our approach does not come without costs of its own. In particular, the hypothesis testing framework can require extremely large sample sizes, making it infeasible to address what would otherwise appear to be quite sensible questions. As a motivating example, the risk of long-term urinary dysfunction after radical prostatectomy is approximately 20%. Imagine that there was disagreement over one particular step in the radical prostatectomy, such as the order in which structures were dissected, with variation in the community. We might assume that the relatively minor differences in approach would be unlikely to lead to large differences in outcome. A traditional ’superiority’ trial to determine whether there was a 1% difference between groups - small, but worth having for such a trivial change in surgical procedure - would require about 50,000 patients.
The alternative to inference is naturally to focus on estimation, that is, “pick the winner” instead of hypothesis testing. This is analogous to “kai zen”, continuous quality improvement as pioneered by Toyota. The underlying rationale for kai zen is that, if there is not much to choose between two approaches in terms of disbenefit, it is worthwhile to go with the approach most likely to give the best outcome rather than to assume no difference unless a difference is demonstrated statistically. In this paper, we compare and contrast the traditional and kai zen approaches to clinical trial analysis, using surgical modifications as the motivating example. The key assumption is that clinical trial resources are limited, and that resources can be expended either on a limited number of definitive trials providing strong evidence, or a larger number of kai zen trials providing a probabilistic guide to action.