Young and Karr propose ways to improve how observational studies are conducted
S. Stanley Young, assistant director for bioinformatics at the National Institute of Statistical Sciences (NISS), and Alan Karr, director at NISS, have published a non-technical article in the September issue of Significance magazine pointing out that medical and other observational studies often produce results that are later shown to be incorrect, andinvoking a quality control perspectivesuggest ways to fix the system.
Their central point is that the current system of publication in peer-reviewed journals relies on post-production inspection to ensure quality, a practice that has disappeared from modern industry in favor of controlling the process instead: quality control is now process control, not product control. They cite W. Edwards Deming, considered by many the most innovative thinker ever about quality, arguing not only for process control, but also that the problem lies with the managersfunders and journalsrather with than the workersindividual researchers who respond rationally to the current set of incentives.
Young and Karr describe both their and others' studies of the extent to which observational studies do not replicate. Published claims such as "coffee causes pancreatic cancer," or "women eating breakfast cereal are more likely to have boy babies," have been refuted by subsequent studies and analyses. When these studies reach the popular media and influence individual consumers, the burden falls not just on science but also on society. And even if there were no impact on the public, scarce research resources, both money and personnel, have been squandered.
The paper describes several technical difficulties with observational studies, among them multiple testing (if enough questions are asked, some will yield false positive answers), bias (systematic error) and multiple modeling (searching among mathematical models until one is found that "fits the data"). Publication bias is another issue: papers reporting positive scientific results (for example, an association between Type A personalities and heart attacks) are more likely to be published than those reporting negative results, even though the latter may be as important scientifically.
Young and Karr recommend that when a study is submitted for publication, the data be split into two sets, a modeling data set and a holdout data set. Journals would then accept or reject papers based on the analysis of the modeling data set without knowing the results of applying the methods to the holdout set. But then the journal would also publish an addendum to the paper giving the results of the analysis of the holdout set.