Cordaro, Nicholas J.; Andrew J. Kavran; Michael Smallegan; Megan Palacio; Nicolaus Lammer; Tyler S. Brant; Vanessa DuMont; Naiara Doherty Garcia; Suzannah Miller; Tara Jourabchi; Sara L. Sawyer and Aaron Clauset

Despite substantial standardization, polymerase chain reaction (PCR) experiments frequently fail. Troubleshooting failed PCRs can be costly in both time and money. Using a crowdsourced data set spanning 290 real PCRs from six active research laboratories, we investigate the degree to which PCR success rates can be improved by machine learning. While human designed PCRs succeed at a rate of 55–63%, we find that a machine learning model can accurately predict reaction outcome 81% of the time. We validate this level of improvement by then using the model to guide the design and predict the outcome of 39 new PCR experiments. In addition to improving outcomes, the model identifies 15 features of PCRs that researchers did not optimize well compared to the learned model. These results suggest that PCR success rates can easily be improved by 17–26%, potentially saving millions of dollars and thousands of hours of researcher time each year across the scientific community. Other common laboratory methods may benefit from similar data-driven optimization effort.