Reproducibility in Chemical Research
Graphical Abstract
“… To what extent is reproducibility a significant issue in chemical research? How can problems involving irreproducibility be minimized? … Researchers should be aware of the dangers of unconscious investigator bias, all papers should provide adequate experimental detail, and Reviewers have a responsibility to carefully examine papers for adequacy of experimental detail and support for the conclusions …” Read more in the Editorial by Robert G. Bergman and Rick L. Danheiser.
Reproducibility is a defining feature of science. Lately, however, serious concerns have been raised regarding the extent to which the results of research, especially biomedical research, are easily replicated. In this Editorial, we discuss to what extent reproducibility is a significant issue in chemical research and then suggest steps to minimize problems involving irreproducibility in chemistry.
Sources of Irreproducibility
Problems involving reproducibility range in severity from major to relatively minor. While a complete failure to repeat published results rarely occurs in chemistry, it is not unusual for chemists to encounter difficulty in exactly replicating reaction yields, selectivities, and other data reported in publications. Instances of irreproducibility such as these are all too common, especially in synthetic chemistry.
We believe that one source of irreproducibility, the deliberate falsification of data, is certainly not unheard of but is rare in chemistry. More common instances of problems with reproducibility are cases in which researchers publish incorrect data that they believe to be valid or modify results either consciously or unconsciously to fit their preconceptions. A third cause of irreproducibility involves bona fide results that simply turn out to be difficult to repeat in other laboratories.
The deliberate falsification of data can take various forms besides the wholesale fabrication of results. How extensive is the problem of modifying data to support investigators’ interpretation of results? Some evidence on this issue is provided by the editorial staff of the journal Organic Letters, who have found that 2–3 % of submitted manuscripts include evidence for the manual removal of peaks in NMR spectra, and this number has not decreased over the few years in which spectra have been checked for this problem. Related violations of appropriate conduct include discarding data points that are not consistent with desired results, and only reporting the best yield or selectivity for a synthetic reaction.
Unconscious Investigator Bias
Some of the most widely discussed cases of irreproducible results in chemistry involve investigators who truly believed that their irreproducible results were correct. Famous examples from chemistry and other disciplines include the celebrated “cold fusion” claim by Pons and Fleischmann, and the Benveniste homeopathy study in which basophil degranulation was claimed to be effected by a solution containing, on average, less than one molecule of certain antibody preparations.
These examples underline one of the most vexing issues about scientific irreproducibility, which in our opinion too few scientists have thought about: unconscious investigator bias. A particularly revealing study was reported many years ago by psychologist Robert Rosenthal, and involved a rat maze-running study. One group of experimenters was told they were given “maze-bright” (well-trained) rats, and the second was told they were given “maze-dumb” (untrained) rats. In spite of the fact that the rats were in fact randomly assigned, the rats the experimenters thought were brighter actually did better in running the mazes. The rats differed only in the experimenters’ expectations for them.
Investigator bias has become something of a cause celebre in psychological and sociological circles, especially when poor statistical analysis of data seriously exacerbates the problem. However, we believe that this problem exists in the “harder” sciences as well. In addition to the two cited above, most chemists know of other instances in which investigators have stoutly defended experimental results and/or interpretations or theories long after the community has concluded that they are wrong.
Replicating Valid Results
Falsification and error are two concerns, but we believe that the majority of reproducibility problems encountered by chemists involve bona fide results, that is, results that were indeed actually obtained by the original authors. Replicating results in synthetic organic, organometallic, and inorganic chemistry continues to present problems even for experienced and skilled researchers. Just how challenging it can be to reproduce results in synthetic organic chemistry can be gleaned from the experience of one of us as Editor in Chief of Organic Syntheses. This journal is unique in that every experimental result must be reproduced in the laboratory of one of the distinguished members of the Board of Editors prior to publication. During the period 2010–2016, 7.5 % of the articles submitted to Organic Syntheses were rejected because the yield and/or selectivity reported by the original authors could not be reproduced within a reasonable range in the laboratory of one of the editors. Since authors know that their work will be checked by one of the Organic Syntheses editors, one can assume that the authors did indeed obtain the results they reported and were confident that their work would be reproducible. Note also that Organic Syntheses requires exceptionally detailed experimental procedures, and that if problems arise in the course of checking, the Organic Syntheses editor consults the original authors for assistance. The fact that results in 1 in 13 articles proved to not be reproducible even with all of these advantages underscores the challenges associated with reproducibility in synthetic chemistry.
Recommendations
What steps can we in the chemistry community take to increase the reproducibility of published work? Below we provide recommendations for principal investigators (PIs) and their co-workers, for journal editors and journal advisory boards, and for the reviewers of articles submitted for publication.
PIs should stay in close communication with co-workers, through regular face-to-face research talks in which primary data and laboratory notebooks are examined. Even in collaborations encompassing widely divergent research areas, individual PIs should make sure that their part is run responsibly. While raising financial support is a critical and important contribution to a project, co-authorship should also require direct participation in the study and significant intellectual contribution to the work.
Intelligent skepticism about one's results is especially important. PIs who train their co-workers (and themselves!) to be skeptical of results, especially those that they want to believe are correct, are good role models. In some areas of research it is good practice for PIs to arrange for “internal checking” of key results prior to the submission of a paper. For example, in the case of a report on new synthetic chemistry, a co-worker without experience in the area might be assigned to repeat a representative example based only on the supporting information. This practice could ensure that the published instructions are sufficient to ensure reproducibility.
PI bias can extend to the evaluations of co-workers. When potential irreproducibility issues are raised by concerned people in a laboratory, the PI should resist the temptation to “kill the messenger” and carefully consider the concerns of those having the courage to speak out. When incorrect results are published, PIs should avoid concentrating all the blame on the offending co-worker—all co-authors are responsible for published results, including the PI.
We urge that supporting information providing adequate experimental detail be required for all publications. The most reputable chemistry journals have explicit requirements for characterization of new compounds, and it would be appropriate for them to expand their requirements with regard to experimental detail (or add them where currently there are no specific requirements). Referee report forms should include an explicit request for reviewers to comment on the adequacy of experimental detail. All journals should check papers for data manipulation.
The community should encourage publication in journals that maintain high experimental standards that thus increase likelihood of reproducibility. This encouragement can be achieved by the usual reward system (funding, promotion, awards).
Last but not least, reviewers need to be more than just Roman emperors indicating with pollice verso whether articles are sufficiently novel and significant to deserve publication in a particular journal. Reviewers have a responsibility to carefully examine papers for adequacy of experimental detail and support for the conclusions. It is possible that the recent publication of several papers describing fabricated results might have been avoided if the reviewers had performed a more careful analysis of the manuscripts.