AI-Driven Drug Discovery: Why Molecular Dynamics Simulations Matter

January 9, 2024

Prediction: training data from molecular dynamics simulations will be key.

This is a reaction to the VentureBeat article “AI-driven drug discovery is poised to boom in 2024” .

Background story: I was part of the initial AlphaFold team at DeepMind, back in early 2016. We knew that ideally we would train a deep neural network end-to-end on a large dataset of mappings (amino-acid sequence -> 3d structure), however the data in the Protein Data Bank was not enough to do it “easily”, out-of-the-box. If only we could generate synthetic data at scale! We thought of using molecular dynamics simulations: we would spend a lot of compute time (ok, inside Google) and then distil that knowledge into the neural network, much like AlphaGo did. Unfortunately, even that was out of reach: molecular dynamics simulators could only fold relatively small proteins, and just setting up the infrastructure to do that at scale was hard, even for an elite research organisation. We ended up sticking to the experimental data available, adding extra unsupervised proteomics data (multiple sequence alignments), and tried a lot of tricks to somehow encode some physical priors into the neural network architecture - as well as include some sort of search process, guided by physics potentials.

Fast forward to 2024.

The challenges are now more ambitious: rather than “just” predicting the structure of proteins, drug design also requires predicting the interaction between proteins and small molecules (a combinatorial explosion!). In addition to that, you would like to know the location of the binding pocket, the 3d pose of both molecules, the binding affinity, and whether there are some conformational changes in the protein structure induced by this binding. Training data for such processes might be harder to gather, and you will get a much sparser coverage of the state space of interest (i.e. the vast majority of protein-ligand pairs will never be in your experimental training data). The need for synthetic data, generated by molecular dynamics simulations gets bigger!

At Inductiva Research Labs we have been thinking about this kind of problems (not only for molecular dynamics, but also for fluid dynamics, structural mechanics, plasmas, and eventually every domain of physical reality!). That’s why we are building a simulation platform that is super easy to use (just write a small Python script on your laptop) but arbitrarily scalable (simulations will run on the best hardware for that simulator and simulation parameters, on a cluster or the cloud). We envision a future where AI will blend with scientific computing, and it will expand the boundaries of what is possible to simulate and predict about Nature.

Reach out to us if you want to know more!

PS - congrats to my ex-colleagues at Google DeepMind and now Isomorphic Labs for the impressive progress in this area, they continue to be a great source of inspiration for me. 🙏🧬🚀