Oct - Caroline Colijn¶
Speaker: Caroline Colijn
Talk Title: “Beyond the SNP threshold: clustering and outbreak reconstruction with genomic data”
Thursday, October. 11, 2018 6:00pm
Affiliation: Professor & Canada 150 Research Chair, Department of Mathematics, Simon Fraser University
Web-site: Caroline Colijn (SFU)
Web-site: Caroline Colijn (ICL)
Caroline Colijn develops mathematical tools driven by applications in infection, evolution and public health. These range from reconstructing who infected whom with the help of sequence data, to using machine learning tools to predict successful and unsuccessful pathogen sub-groups on trees, to building predictive models to design intervention strategies. She has constructed metrics on both labelled and unlabelled phylogenetic trees, which help to uncover distinct alternative stories of the ancestry patterns among a set of taxa, and to compare evolution in different places and times (or to compare simulations to data). She also works on modelling diverse circulating infections, and has found that interactions between pathogen strains are a key determinant of their responses to human interventions. She is a founding member of Imperial College London’s Centre for the Mathematics of Precision Healthcare, and has recently moved to Simon Fraser University to take up a Canada 150 Research Chair in the Department of Mathematics.
As sequencing becomes more affordable, routine sequencing has the potential to influence the public health response to outbreaks. A key first step is typically to assign cases to clusters based on sequence similarity, for example grouping cases together if the sequences are within 12 SNPs of each other. Once clusters are identified, further investigation can be done to identify transmission events and reconstruct the outbreak to inform current or future control measures.
In this talk I will discuss an alternative to SNP thresholds to define transmission clusters. Our approach is based on computing the probability that there were many intermediate transmissions separating two cases, and assigning cases to clusters according to not only the SNP distances between sequences but to the number of intermediate transmissions. Next, I will describe a Bayesian approach to reconstruct an outbreak based on sequence data, timing information and clinical data, and I will outline our machine learning approach to include demographic data, clinical data and behavioural data in our analysis alongside sequences and clusters.
Trainees are invited to meet with the VanBUG speaker for open discussion of both science and career paths. This takes place 5:00-5:45pm in either the Boardroom or Lunchroom on the ground floor of the BCCRC
Introductory Speaker: Tizian Schulz (PhD Student, Bielefeld University, Germany / Visiting Student, Faraz Hach Lab, Vancouver Prostate Centre)
Title: “Efficient querying of a pan-genome”
One of the most common tasks in computational genomics is the comparison of biological sequences, e.g. DNA sequences, to answer biological questions of any kind. Due to the lack of such sequences in former times, analyses have often been limited to inter-species comparisons on the gene level. Nowadays, modern high-throughput sequencing technologies provide a continuously growing wealth of DNA sequences that allows the genome-wide comparison of thousands of individual sequences of the same species or any other taxonomic unit. Such a set of highly redundant and similar sequences is called a pan-genome. Pan-genomes can be stored as sequence graphs which significantly reduces memory requirements and facilitates the comparison of the sequences involved. On the other hand, the analysis of a graph appears to be much more challenging than the comparison a set of strings. In this talk, a new method is introduced that tries to query a pan-genome represented as a sequence graph in spirit of the famous alignment search algorithm BLAST.
(This technology is brought to you by Compute Canada and WestGrid with support from PHSA Telehealth)