Best Practices for Constructing, Preparing, and Evaluating Protein-Ligand Binding Affinity Benchmarks [Article v1.0]

David Hahn; Christopher Bayly; Melissa L. Boby; Hannah Bruce Macdonald; John Chodera; Vytautas Gapsys; Antonia Mey; David Mobley; Laura Perez Benito; Christina Schindler; Gary  Tresadern; Gregory Warren

doi:10.33011/livecoms.4.1.1497

Authors

David F. Hahn Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium https://orcid.org/0000-0003-2830-6880
Christopher I. Bayly OpenEye Scientific Software, 9 Bisbee Court, Suite D, Santa Fe, NM 87508 USA https://orcid.org/0000-0001-9145-6457
Melissa L. Boby Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065 USA https://orcid.org/0000-0003-1920-206X
Hannah E. Bruce Macdonald MSD R&D Innovation Centre, 120 Moorgate, London EC2M 6UR, United Kingdom https://orcid.org/0000-0002-5562-6866
John D. Chodera Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA https://orcid.org/0000-0003-0542-119X
Vytautas Gapsys Computational Biomolecular Dynamics Group, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany https://orcid.org/0000-0002-6761-7780
Antonia S.J.S. Mey EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King's Buildings, Edinburgh, EH9 3FJ, UK https://orcid.org/0000-0001-7512-5252
David L. Mobley Departments of Pharmaceutical Sciences and Chemistry, University of California, Irvine, CA USA https://orcid.org/0000-0002-1083-5533
Laura Perez Benito Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30 Beerse B-2340 Belgium https://orcid.org/0000-0001-9607-9048
Christina E.M. Schindler Computational Chemistry & Biology, Merck KGaA, Frankfurter Str. 250, 64289 Darmstadt, Germany https://orcid.org/0000-0002-8980-048X
Gary Tresadern Computational Chemistry Janssen Research & Development Turnhoutseweg 30 Beerse B-2340 Belgium https://orcid.org/0000-0002-4801-1644
Gregory L. Warren DeepCure, 131 Dartmouth St, Boston, MA 2116 USA https://orcid.org/0000-0003-4017-0162

DOI:

https://doi.org/10.33011/livecoms.4.1.1497

Abstract

Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and developers with a way to assess the expected impact of new methodologies. These assessments require construction of a benchmark—a set of well-prepared, high quality systems with corresponding experimental measurements designed to ensure the resulting calculations provide a realistic assessment of expected performance. To date, the community has not yet adopted a common standardized benchmark, and existing benchmark reports suffer from a myriad of issues, including poor data quality, limited statistical power, and statistically deficient analyses, all of which can conspire to produce benchmarks that are poorly predictive of real-world performance. Here, we address these issues by presenting guidelines for (1) curating experimental data to develop meaningful benchmark sets, (2) preparing benchmark inputs according to best practices to facilitate widespread adoption, and (3) analysis of the resulting predictions to enable statistically meaningful comparisons among methods and force fields. We highlight challenges and open questions that remain to be solved in these areas, as well as recommendations for the collection of new datasets that might optimally serve to measure progress as methods become systematically more reliable. Finally, we provide a curated, versioned, open, standardized benchmark set adherent to these standards (protein-ligand-benchmark) and an open source toolkit for implementing standardized best practices assessments (arsenic) for the community to use as a standardized assessment tool. While our main focus is benchmarking free energy methods based on molecular simulations, these guidelines should prove useful for assessment of the rapidly growing field of machine learning methods for affinity prediction as well.

Best Practices for Constructing, Preparing, and Evaluating Protein-Ligand Binding Affinity Benchmarks [Article v1.0]

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Categories

License

Information

Browse

browsebyissue

twitterfeed