Apr - Anshul Kundaje¶
Speaker: Anshul Kundaje
Talk Title: “Deciphering cis-regulatory logic using interpretable deep learning models”
Thursday, April 11, 2019 6:00pm
Affiliation: Assistant Professor of Genetics and Computer Science, Stanford University
Website: Anshul Kundaje
Anshul Kundaje is an Assistant Professor of Genetics and Computer Science at Stanford University. The Kundaje lab develops interpretable machine learning methods to learn integrative models of gene regulation from large-scale functional genomics data and decipher the genetic and genomic basis of disease. Anshul completed his Ph.D. in Computer Science from Columbia University in 2008. As a postdoc at Stanford (2008-12) and a Research Scientist at MIT (2012-13), he led the integrative analysis efforts of ENCODE Consortium and The Roadmap Epigenomics Project. Anshul is a recipient of the 2019 HUGO Chen Award of Excellence, the 2016 NIH Director’s New Innovator Award and the 2014 Alfred Sloan Foundation Fellowship.
Functional genomics experiments profiling genome-wide regulatory state have revealed millions of putative regulatory elements in diverse cell states. These massive datasets have spurred the development of Deep Neural Networks (DNNs) that can accurate map DNA sequence to associated cell-type specific molecular phenotypes such as TF binding, chromatin accessibility, splicing and gene expression. I will present a critical overview of a variety of deep learning architectures, training and model evaluation strategies for learning predictive regulatory models from functional genomics profiles. Beyond high prediction accuracy, the primary appeal of DNNs is that they are capable of automatically learning predictive, biologically relevant patterns directly from raw data representations (e.g. raw DNA sequence) without many prior assumptions. I will present efficient interpretation engines for deep learning models to decipher nucleotide-resolution transcription factor binding events, consolidated TF motif representations, cooperative and epistatic motif interactions in cis-regulatory sequence grammars, dynamic cis and trans regulatory drivers of cellular differentiation and non-coding regulatory genetic variants.
Trainees are invited to meet with the VanBUG speaker for open discussion of both science and career paths. This takes place 5:00-5:45pm in either the Boardroom or Lunchroom on the ground floor of the BCCRC
Introductory Speaker: William Casazza (PhD student, Sara Mostafavi lab, UBC)
“Genome wide association studies (GWAS) have allowed us to discover many loci implicated in disease. However, the design of GWAS prevent us from directly estimating the causal effects of each locus, eliminating a straightforward way to learn a mutation’s mechanism of action. In previous studies, this has been overcome through the use of methods based on Mendelian randomization, which allows us to use genotype, and GWAS summary statistics, to infer the causal effect of a separate risk variable on disease. With multi-omic datasets, one can begin to reason about the causal role mutations play in regulating molecular traits, such as gene expression, without needing to draw power from studies of large cohorts. In recent work from our group, we used an established causal inference test (CIT) to infer instances whereby genetic associations with gene expression were mediated by molecular traits. However, with potentially hundreds of loci associated with a single gene, the use of one test per locus likely fails to account for the role of complex interaction between loci. In order to investigate the benefit of considering multiple loci in multi-omic causal inference, I apply methods for reducing genotype to a smaller set of latent variables, which I use in place of genotype in our previous mediation tests. I apply this method to multi-omic dataset comprised of 411 tissue samples from the dorsolateral prefrontal cortex of older individuals all having H3K27 acetylation, Illumina 450k array methylation data, as well as imputed genotypes derived from the Affymetrix GeneChip 6.0 platform. Using several latent variable methods, I explore whether or not accounting for several genotypes simultaneously can improve our ability to detect causal mediation in a multi-omic data set.”
(This technology is brought to you by Compute Canada and WestGrid with support from PHSA Telehealth)