The Journey of Data: Lessons Learned in Modeling Kinase Affinity, Selectivity, and Resistance [Article v1.0]

Authors

  • Raquel López-Ríos de Castro In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Germany; Data Driven Drug Design, Saarland University, Saarbrücken, Germany; Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA; https://orcid.org/0000-0003-2668-7405
  • Jaime Rodríguez-Guerra In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Germany and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0001-8974-1566
  • David Schaller In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Germany and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0002-1881-4518
  • Talia B. Kimber In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Germany and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0002-8881-920X
  • Corey Taylor In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Germany and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0003-2535-9514
  • Jessica B. White Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0001-5748-8383
  • Michael Backenköhler Data Driven Drug Design, Saarland University, Saarbrücken, Germany https://orcid.org/0000-0002-7913-2932
  • Joschka Groß Neuro-Mechanistic Modeling, German Research Center for Artificial Intelligence, Saarbrücken, Germany https://orcid.org/0000-0002-8957-8807
  • Alexander M. Payne Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0003-0947-0191
  • Benjamin Kaminow Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0002-2266-3353
  • Ivan Pulido Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0002-7178-8136
  • Sukrit Singh Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0003-1914-4955
  • Paula Linh Kramer Data Driven Drug Design, Saarland University, Saarbrücken, Germany https://orcid.org/0009-0000-0302-8744
  • Guillermo Pérez-Hernández In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Germany; Data Driven Drug Design, Saarland University, Saarbrücken, Germany; Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA; https://orcid.org/0000-0002-9287-8704
  • Andrea Volkamer In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Germany and Data Driven Drug Design, Saarland University, Saarbrücken, Germany https://orcid.org/0000-0002-3760-580X
  • John D. Chodera Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA https://orcid.org/0000-0003-0542-119X

DOI:

https://doi.org/10.33011/livecoms.6.1.3875

Keywords:

Structure-based ML, ML drug discovery, Kinases, Binding affinity predictions, Drug discovery framework, Protein-ligand binding, Data harmonization

Abstract

Recent advances in machine learning (ML) are reshaping drug discovery. Structure-based ML methods use physically-inspired models to predict binding affinities from protein:ligand complexes. These methods promise to enable the integration of data for many related targets, which addresses issues related to data scarcity for single targets and could enable generalizable predictions for a broad range of targets, including mutations. In this work, we report our experiences in building KinoML, a novel framework for ML in target-based small molecule drug discovery with an emphasis on structure-enabled methods. KinoML focuses currently on kinases as the relative structural conservation of this protein superfamily, particularly in the kinase domain, means it is possible to leverage data from the entire superfamily to make structure-informed predictions about binding affinities, selectivities, and drug resistance. Some key lessons learned in building KinoML include the importance of reproducible data collection and deposition, the harmonization of molecular data and featurization, and the selection of the right data format to ensure reusability and reproducibility of ML models. As a result, KinoML allows users to easily achieve three tasks: accessing and curating molecular data; featurizing this data with representations suitable for ML applications; and running reproducible ML experiments that require access to ligand, protein, and assay information to predict ligand affinity. Despite KinoML focusing on kinases, this framework can be applied to other proteins. The lessons reported here can help guide the development of platforms for structure-enabled ML in other areas of drug discovery.

A workflow for the KinoML project

Published

2025-11-19

How to Cite

López-Ríos de Castro, R., Rodríguez-Guerra, J., Schaller, D., Kimber, T. B., Taylor, C., White, J. B., … Chodera, J. D. (2025). The Journey of Data: Lessons Learned in Modeling Kinase Affinity, Selectivity, and Resistance [Article v1.0]. Living Journal of Computational Molecular Science, 6(1), 3875. https://doi.org/10.33011/livecoms.6.1.3875

Issue

Section

Articles

Categories