The Molecular ML Reading Group aims to explore foundational and recent applications of machine learning models to modeling molecular system. The reading group is hosted by Rosetta Commons and organized by
Other AI in Chemistry Lecture Series
Every-other-week series on Diffusion and other molecular ML models
Yuxuan Song from the Institute for Artificial Industry Research at Tsinghua University joins us to discuss EquiFM, a method to generate molecules using flow matching. In comparison to diffusion/denoising score matching, which requires simulating the diffusion process, in their method they use quivariant optimial transport to rapidly generate trajectories between samples from an easy to sample prior distribution and samples from a target distribution. By using equivariance, they are able to make learning objective more stable than diffusion score matching, and straighter straighter than typical linear interpolation-based flow matching.
Here are the:
Nima Shoghi joins us to discuss Joint Multi-domain Pre-training, exploring the feasability of leverating large-scale simulation data for chemical property and materials prediction tasks. Nima is currently a researcher at the High Performance Computer Architecture Lab at Georgia Tech and did the work he presented while an AI resident at Meta Fundamental AI Research (FAIR).
Here are the:
Matt O’Meara presented a chalk talk style introduction to Flow Matching a simplified generalization of denoising diffusion models. The work coveres background and basics of Flow Matching for molecular modeling.
This talk draws from the following material
Here is the:
Ian Dunn from David Koes lab at the University of Pittsburgh joined us to present Keypoint Diffusion
Here are the:
Alex Chu from Po-Ssu Huang’s lab joined us to present Protpardelle
Here are the:
This week Sidney Lisanza from the Baker Lab joined us to present ProteinGenerator
Here are the:
This week Sarah Alamdari a data scientist at Microsoft Research presented EvoDiff
Here are the:
This week Deniz Akpinaroglu from the Kortemme Lab at UCSF presented her recent work on Frame2seq
Here are the:
This week Yeqing Lin from the AlQuraishi lab at Columbia joined us to discuss Genie, a diffusion based model for protein structure prediction that equivariantly diffuses over oriented residue clouds.
Here is the:
This week Patrick Bryant from Frank Noé’s lab at FU Berlin and starting his own group soon at Stockholm University/Science for Life Laboratory joined us to discuss Umol, a deep learning based ligand docking method. He gives some interesting details about trying to train a variant of AlphaFold2 that considers small molecule atoms and balancing the more abundant constraints from the mulitple sequence alignment with the less abundant constraints for the ligand geometry.
Here is the:
This week Joey Bose, Tara Akhound-Sadegh, and colleagues join us to present FoldFlow (Bose et al., 2023), a conditional flow matching model for protein backbone generation. Using insights from differential geometry they improve the training of flows on the Riemannian manifold over a collection of protein backbone frames. They achieve SOTA performance on non-pretrained de novo backbone generation for a number of metrics.
Here is the:
This week we Rohith Krishna, a graduate student in the Baker Lab at UW, joins us to present RoseTTAFold All-Atom (Krishna, et al., 2023). This work generalizes the RoseTTA Fold “three-track architecture” to handle non-protein molecules. They then use this for a range of different applications including re-training RFDiffusion to enable predicting ligand and cofactor binding.
Here is the
This week we discuss RFDiffusion (Watson, et al., 2023), a diffusion denoising model based on RoseTTAFold to generate realistic protein protein backbones. Nick Randolf gives an overview of the method and we discuss the method details.
Here are the
This week we dug into (Yim, et al., 2023) introducing FrameDiff, a new Diffusion model for protein backbone generation. In this work they develop the theory of diffusion/denoising machine learning over Remannian manifolds. As a key application they consider each backbone residue as a rigid frame similar to AlphaFold2. Since each frame has rotational and translational symmetry in 3D, the diffusion should be SE(3) equivariant. In contrast with RFDiffusion which directly predicts coordinates, in this work they use the stochastic differential equation / score matching version of diffusion/denoising developed in (Song, et al., 2021). In particular they work out the math for Brownian diffusion on this manifold. To test their model they train their model directly on a curated subset of the protein databank (unlike RFDiffusion which uses a pre-trained structure prediction module), and measure the designability, diversity, and novelty of FrameDiff generated backbones. While it was difficult for them to direclty compare with RFDiffusion because at the time RFDiffusion code was not yet released, it appears to be not quite as performant, but significantly faster and the model is only 1/4 the size.
Here are
Diffusion models is a Deep Learning architecture that has lead to breakthroughs in generating realistic structured data from images, text, and sound. Diffusion models have also been used to generate proteins and other biomolecular systems. In this first meeting, Nick gives an overview of the basics of the diffusion model architecture including how to efficiently add noisy data and efficiently train the models by predicting the noise, and generate samples by iteratively removing the predicted noise. We cover ideas developed in the following works
Here are the
Boston Protein Design and Modeling Club
ML for protein engineering seminar series