Molecular ML Reading Group

The Molecular ML Reading Group aims to explore foundational and recent applications of machine learning models to modeling molecular system. The reading group is hosted by Rosetta Commons and organized by

  • Nick Randolph a PhD candidate in the Kuhlman Lab at UNC Chapel Hill (nzrandol@unc.edu)
  • Matthew O’Meara an Assistant Professor at the University of Michigan (maom@umich.edu).

Other AI in Chemistry Lecture Series



Biomolecular ML Models Series

Every-other-week series on Diffusion and other molecular ML models

  • Next meeting: 6/12/2024 11-12 pm EDT (UTC-4)
  • Jiying Zhang will present SubGDiff a molecular diffusion based model that takes into account molecular subgraphs
  • Zoom link
  • Google Calendar



EquiFM (4/24/2024)

Yuxuan Song from the Institute for Artificial Industry Research at Tsinghua University joins us to discuss EquiFM, a method to generate molecules using flow matching. In comparison to diffusion/denoising score matching, which requires simulating the diffusion process, in their method they use quivariant optimial transport to rapidly generate trajectories between samples from an easy to sample prior distribution and samples from a target distribution. By using equivariance, they are able to make learning objective more stable than diffusion score matching, and straighter straighter than typical linear interpolation-based flow matching.

Here are the:



Joint Multi-domain Pre-training (4/10/2024)

Nima Shoghi joins us to discuss Joint Multi-domain Pre-training, exploring the feasability of leverating large-scale simulation data for chemical property and materials prediction tasks. Nima is currently a researcher at the High Performance Computer Architecture Lab at Georgia Tech and did the work he presented while an AI resident at Meta Fundamental AI Research (FAIR).

Here are the:



Introduction to Flow Matching (3/27/2024)

Matt O’Meara presented a chalk talk style introduction to Flow Matching a simplified generalization of denoising diffusion models. The work coveres background and basics of Flow Matching for molecular modeling.

This talk draws from the following material

Here is the:



Keypoint Diffusion (3/13/2024)

Ian Dunn from David Koes lab at the University of Pittsburgh joined us to present Keypoint Diffusion

Here are the:



Protpardelle (2/28/2024)

Alex Chu from Po-Ssu Huang’s lab joined us to present Protpardelle

Here are the:



ProteinGenerator (2/14/2024)

This week Sidney Lisanza from the Baker Lab joined us to present ProteinGenerator

Here are the:



EvoDiff (1/31/2024)

This week Sarah Alamdari a data scientist at Microsoft Research presented EvoDiff

Here are the:



Frame2Seq (1/17/2024)

This week Deniz Akpinaroglu from the Kortemme Lab at UCSF presented her recent work on Frame2seq

  • Structure-conditioned masked language models for protein sequence design generalize beyond the native sequence space Deniz Akpinaroglu, Kosuke Seki, Amy Guo, Eleanor Zhu, Mark J. S. Kelly, Tanja Kortemme DOI: 10.1101/2023.12.15.571823

Here are the:



GENIE (12/6/2023)

This week Yeqing Lin from the AlQuraishi lab at Columbia joined us to discuss Genie, a diffusion based model for protein structure prediction that equivariantly diffuses over oriented residue clouds.

  • Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds Yeqing Lin, Mohammed AlQuraishi DOI: 10.48550/arXiv.2301.12485

Here is the:



Umol (11/22/2023)

This week Patrick Bryant from Frank Noé’s lab at FU Berlin and starting his own group soon at Stockholm University/Science for Life Laboratory joined us to discuss Umol, a deep learning based ligand docking method. He gives some interesting details about trying to train a variant of AlphaFold2 that considers small molecule atoms and balancing the more abundant constraints from the mulitple sequence alignment with the less abundant constraints for the ligand geometry.

  • Structure prediction of protein-ligand complexes from sequence information with Umol Patrick Bryant, Atharva Kelkar, Andrea Guljas, Cecilia Clementi, and Frank Noé DOI: 10.1101/2023.11.03.565471

Here is the:



FlowFold (11/08/2023)

This week Joey Bose, Tara Akhound-Sadegh, and colleagues join us to present FoldFlow (Bose et al., 2023), a conditional flow matching model for protein backbone generation. Using insights from differential geometry they improve the training of flows on the Riemannian manifold over a collection of protein backbone frames. They achieve SOTA performance on non-pretrained de novo backbone generation for a number of metrics.

  • SE(3)-Stochastic Flow Matching for Protein Backbone Generation Avishek Joey Bose, Tara Akhound-Sadegh, Kilian Fatras, Guillaume Huguet, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong DOI: 10.48550/arXiv.2310.16802

Here is the:



RoseTTAFold All-Atom (10/25/2023)

This week we Rohith Krishna, a graduate student in the Baker Lab at UW, joins us to present RoseTTAFold All-Atom (Krishna, et al., 2023). This work generalizes the RoseTTA Fold “three-track architecture” to handle non-protein molecules. They then use this for a range of different applications including re-training RFDiffusion to enable predicting ligand and cofactor binding.

  • Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker DOI: 10.1101/2023.10.09.561603

Here is the



RFDiffusion (10/11/2023)

This week we discuss RFDiffusion (Watson, et al., 2023), a diffusion denoising model based on RoseTTAFold to generate realistic protein protein backbones. Nick Randolf gives an overview of the method and we discuss the method details.

  • De novo design of protein structure and function with RFdiffusion Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Sergey Ovchinnikov, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek & David Baker Nature 620, 1089–1100 (2023). DOI: 10.1038/s41586-023-06415-8

Here are the



FrameDiff (9/27/2023)

This week we dug into (Yim, et al., 2023) introducing FrameDiff, a new Diffusion model for protein backbone generation. In this work they develop the theory of diffusion/denoising machine learning over Remannian manifolds. As a key application they consider each backbone residue as a rigid frame similar to AlphaFold2. Since each frame has rotational and translational symmetry in 3D, the diffusion should be SE(3) equivariant. In contrast with RFDiffusion which directly predicts coordinates, in this work they use the stochastic differential equation / score matching version of diffusion/denoising developed in (Song, et al., 2021). In particular they work out the math for Brownian diffusion on this manifold. To test their model they train their model directly on a curated subset of the protein databank (unlike RFDiffusion which uses a pre-trained structure prediction module), and measure the designability, diversity, and novelty of FrameDiff generated backbones. While it was difficult for them to direclty compare with RFDiffusion because at the time RFDiffusion code was not yet released, it appears to be not quite as performant, but significantly faster and the model is only 1/4 the size.

Here are



Introduction to Diffusion Models (9/13/2023)

Diffusion models is a Deep Learning architecture that has lead to breakthroughs in generating realistic structured data from images, text, and sound. Diffusion models have also been used to generate proteins and other biomolecular systems. In this first meeting, Nick gives an overview of the basics of the diffusion model architecture including how to efficiently add noisy data and efficiently train the models by predicting the noise, and generate samples by iteratively removing the predicted noise. We cover ideas developed in the following works

  • Sohl-Dickstein et al. (2015) (https://arxiv.org/abs/1503.03585)
  • Yang & Ermon (2019) (https://arxiv.org/abs/1907.05600)
  • Ho et al. (2020) (https://arxiv.org/abs/2006.11239)
  • Song et al. (2021) (https://arxiv.org/abs/2011.13456)

Here are the


Other AI in Chemistry Lectures series

VantAI

Valence Labs

Boston Protein Design and Modeling Club

  • BPDMC hosts a monthly meeting that begins with dinner, drinks, and an hour-long scientific program. We have seminars, tutorials and workshops, and the occasional moderated group discussion after a major breakthrough is announced. The scientific program is followed by an hour (or so) for additional discussion and networking.
  • YouTube

ML for protein engineering seminar series

  • A bi-weekly seminar series focused on recent work in machine learning for protein engineering.
  • YouTube