Molecular ML Reading Group

The Molecular ML Reading Group aims to explore foundational and recent applications of machine learning models to modeling molecular system. The reading group is hosted by Rosetta Commons and organized by

Nick Randolph a PhD candidate in the Kuhlman Lab at UNC Chapel Hill (nzrandol@unc.edu)
Matthew O’Meara an Assistant Professor at the University of Michigan (maom@umich.edu).

Other AI in Chemistry Lecture Series

Biomolecular ML Models Series

Every-other-week series on Diffusion and other molecular ML models

Next meeting: 6/12/2024 11-12 pm EDT (UTC-4)
Jiying Zhang will present SubGDiff a molecular diffusion based model that takes into account molecular subgraphs
Zoom link
Google Calendar

EquiFM (4/24/2024)

Yuxuan Song from the Institute for Artificial Industry Research at Tsinghua University joins us to discuss EquiFM, a method to generate molecules using flow matching. In comparison to diffusion/denoising score matching, which requires simulating the diffusion process, in their method they use quivariant optimial transport to rapidly generate trajectories between samples from an easy to sample prior distribution and samples from a target distribution. By using equivariance, they are able to make learning objective more stable than diffusion score matching, and straighter straighter than typical linear interpolation-based flow matching.

Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation Yuxuan Song, Jingjing Gong, Minkai Xu, Ziyao Cao, Yanyan Lan, Stefano Ermon, Hao Zhou, Wei-Ying Ma, NeurIPS2023

Here are the:

Joint Multi-domain Pre-training (4/10/2024)

Nima Shoghi joins us to discuss Joint Multi-domain Pre-training, exploring the feasability of leverating large-scale simulation data for chemical property and materials prediction tasks. Nima is currently a researcher at the High Performance Computer Architecture Lab at Georgia Tech and did the work he presented while an AI resident at Meta Fundamental AI Research (FAIR).

From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction Nima Shoghi, Adeesh Kolluru, John R. Kitchin, Zachary W. Ulissi, C. Lawrence Zitnick, Brandon M. Wood. Accepted at ICLR23 arXiv preprint arXiv:2310.16802, 2023.

Here are the:

Introduction to Flow Matching (3/27/2024)

Matt O’Meara presented a chalk talk style introduction to Flow Matching a simplified generalization of denoising diffusion models. The work coveres background and basics of Flow Matching for molecular modeling.

This talk draws from the following material

(Lipman et al. 2022) Flow Matching for Generative Modeling
- Lecture, slides
(Albergo et al. 2023) Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
(Tong et al. 2023) Improving and generalizing flow-based generative models with minibatch optimal transport
- Lecture
(Bose et al. 2023) SE(3)-Stochastic Flow Matching for Protein Backbone Generation
- Lecture1, Lecture2
- TorchCFM, a computational framework for flow matching in PyTorch
(Campbell et al. 2024)Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

Here is the:

talk

Keypoint Diffusion (3/13/2024)

Ian Dunn from David Koes lab at the University of Pittsburgh joined us to present Keypoint Diffusion

Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure Ian Dunn, David Koes, NeurIPS 2023 GenBio Workshop Spotlight

Here are the:

Protpardelle (2/28/2024)

Alex Chu from Po-Ssu Huang’s lab joined us to present Protpardelle

An all-atom protein generative model Alexander E. Chu, Lucy Cheng, Gina El Nesr, Minkai Xu, Po-Ssu Huang, DOI: 10.1101/2023.05.24.542194

Here are the:

ProteinGenerator (2/14/2024)

This week Sidney Lisanza from the Baker Lab joined us to present ProteinGenerator

“Joint Generation of Protein Sequence and Structure with RoseTTAFold Sequence Space Diffusion” Lisanza SL, Gershon JM, Tipps S, Arnoldt L, Hendel S, Sims JN, Li X DOI: 10.1101/2023.05.08.539766

Here are the:

EvoDiff (1/31/2024)

This week Sarah Alamdari a data scientist at Microsoft Research presented EvoDiff

Protein generation with evolutionary diffusion: Sequence is all you need Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex X. Lu, Nicolo Fusi1, Ava P. Amini, Kevin K. Yang DOI: 10.1101/2023.09.11.556673v1

Here are the:

Frame2Seq (1/17/2024)

This week Deniz Akpinaroglu from the Kortemme Lab at UCSF presented her recent work on Frame2seq

Structure-conditioned masked language models for protein sequence design generalize beyond the native sequence space Deniz Akpinaroglu, Kosuke Seki, Amy Guo, Eleanor Zhu, Mark J. S. Kelly, Tanja Kortemme DOI: 10.1101/2023.12.15.571823

Here are the:

GENIE (12/6/2023)

This week Yeqing Lin from the AlQuraishi lab at Columbia joined us to discuss Genie, a diffusion based model for protein structure prediction that equivariantly diffuses over oriented residue clouds.

Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds Yeqing Lin, Mohammed AlQuraishi DOI: 10.48550/arXiv.2301.12485

Here is the:

Umol (11/22/2023)

This week Patrick Bryant from Frank Noé’s lab at FU Berlin and starting his own group soon at Stockholm University/Science for Life Laboratory joined us to discuss Umol, a deep learning based ligand docking method. He gives some interesting details about trying to train a variant of AlphaFold2 that considers small molecule atoms and balancing the more abundant constraints from the mulitple sequence alignment with the less abundant constraints for the ligand geometry.

Structure prediction of protein-ligand complexes from sequence information with Umol Patrick Bryant, Atharva Kelkar, Andrea Guljas, Cecilia Clementi, and Frank Noé DOI: 10.1101/2023.11.03.565471

Here is the:

FlowFold (11/08/2023)

This week Joey Bose, Tara Akhound-Sadegh, and colleagues join us to present FoldFlow (Bose et al., 2023), a conditional flow matching model for protein backbone generation. Using insights from differential geometry they improve the training of flows on the Riemannian manifold over a collection of protein backbone frames. They achieve SOTA performance on non-pretrained de novo backbone generation for a number of metrics.

SE(3)-Stochastic Flow Matching for Protein Backbone Generation Avishek Joey Bose, Tara Akhound-Sadegh, Kilian Fatras, Guillaume Huguet, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong DOI: 10.48550/arXiv.2310.16802

Here is the:

RoseTTAFold All-Atom (10/25/2023)

This week we Rohith Krishna, a graduate student in the Baker Lab at UW, joins us to present RoseTTAFold All-Atom (Krishna, et al., 2023). This work generalizes the RoseTTA Fold “three-track architecture” to handle non-protein molecules. They then use this for a range of different applications including re-training RFDiffusion to enable predicting ligand and cofactor binding.

Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker DOI: 10.1101/2023.10.09.561603

Here is the

Talk

RFDiffusion (10/11/2023)

This week we discuss RFDiffusion (Watson, et al., 2023), a diffusion denoising model based on RoseTTAFold to generate realistic protein protein backbones. Nick Randolf gives an overview of the method and we discuss the method details.

De novo design of protein structure and function with RFdiffusion Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Sergey Ovchinnikov, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek & David Baker Nature 620, 1089–1100 (2023). DOI: 10.1038/s41586-023-06415-8

Here are the

FrameDiff (9/27/2023)

This week we dug into (Yim, et al., 2023) introducing FrameDiff, a new Diffusion model for protein backbone generation. In this work they develop the theory of diffusion/denoising machine learning over Remannian manifolds. As a key application they consider each backbone residue as a rigid frame similar to AlphaFold2. Since each frame has rotational and translational symmetry in 3D, the diffusion should be SE(3) equivariant. In contrast with RFDiffusion which directly predicts coordinates, in this work they use the stochastic differential equation / score matching version of diffusion/denoising developed in (Song, et al., 2021). In particular they work out the math for Brownian diffusion on this manifold. To test their model they train their model directly on a curated subset of the protein databank (unlike RFDiffusion which uses a pre-trained structure prediction module), and measure the designability, diversity, and novelty of FrameDiff generated backbones. While it was difficult for them to direclty compare with RFDiffusion because at the time RFDiffusion code was not yet released, it appears to be not quite as performant, but significantly faster and the model is only 1/4 the size.

SE(3) diffusion model with application to protein backbone generation Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi Jaakkola DOI: 10.48550/arXiv.2302.02277

Here are

Introduction to Diffusion Models (9/13/2023)

Diffusion models is a Deep Learning architecture that has lead to breakthroughs in generating realistic structured data from images, text, and sound. Diffusion models have also been used to generate proteins and other biomolecular systems. In this first meeting, Nick gives an overview of the basics of the diffusion model architecture including how to efficiently add noisy data and efficiently train the models by predicting the noise, and generate samples by iteratively removing the predicted noise. We cover ideas developed in the following works

Sohl-Dickstein et al. (2015) (https://arxiv.org/abs/1503.03585)
Yang & Ermon (2019) (https://arxiv.org/abs/1907.05600)
Ho et al. (2020) (https://arxiv.org/abs/2006.11239)
Song et al. (2021) (https://arxiv.org/abs/2011.13456)

Here are the

Other AI in Chemistry Lectures series

VantAI

Generative AI in Drug Discovery Lecture Series ** Website, YouTube Channel

Valence Labs

M2D2: Molecular Modeling and Drug Discovery website, YouTube
2023 Molecular Machine Learning Conference website, YouTube
Causality, Abstraction, Reasoning, & Extrapolation website, Youtube
Graphs and Geometry Reading Group website, YouTube

Boston Protein Design and Modeling Club

BPDMC hosts a monthly meeting that begins with dinner, drinks, and an hour-long scientific program. We have seminars, tutorials and workshops, and the occasional moderated group discussion after a major breakthrough is announced. The scientific program is followed by an hour (or so) for additional discussion and networking.
YouTube

ML for protein engineering seminar series

A bi-weekly seminar series focused on recent work in machine learning for protein engineering.
YouTube